Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SR-7725] [String] New validity model #50265

Closed
milseman mannequin opened this issue May 18, 2018 · 3 comments
Closed

[SR-7725] [String] New validity model #50265

milseman mannequin opened this issue May 18, 2018 · 3 comments
Assignees
Labels
standard library Area: Standard library umbrella

Comments

@milseman
Copy link
Mannequin

milseman mannequin commented May 18, 2018

Previous ID SR-7725
Radar rdar://problem/40372743
Original Reporter @milseman
Type Sub-task
Status Closed
Resolution Done
Additional Detail from JIRA
Votes 1
Component/s Standard Library
Labels Sub-task
Assignee @milseman
Priority Medium

md5: 7994f642ab71fb356af656d9218b289c

Parent-Task:

  • SR-7602 UTF8 should be (one of) the fastest String encoding(s)

Issue Description:

String should adopt a validity model more akin to [Rust's](https://doc.rust-lang.org/std/string/struct.String.html), where validity is checked upon init and assumed during processing.

This is meant to track the effort.


From [my comment on SR-7602](https://bugs.swift.org/browse/SR-7602?focusedCommentId=35396&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-35396):

What is the validation story? If the stream of bytes contains invalid UTF-8, do we want:

  1. The initializer to fail resulting in nil
  2. The initializer to fail producing an error
  3. The invalid bytes to be replaced with U+FFFD
  4. The bytes verbatim, and experience the emergent behavior / unspecified results / security hazard from those bytes.

For reference, I think [Rust's model](https://doc.rust-lang.org/std/string/struct.String.html) is pretty good:

`from_utf8` produces an error explaining why the code units were invalid
`from_utf8_lossy` replaces encoding errors with U+FFFD
`from_utf8_unchecked` which takes the bytes, but if there's an encoding error, then memory safety has been violated

I'm not entirely sure if accepting invalid bytes requires voiding memory safety (assuming bounds checking always happens), but it is totally a security hazard if used improperly. We may want to be very cautious about if/how we expose it.

I think that trying to do read-time validation is dubious for UTF-16, and totally bananas for UTF-8.

@milseman
Copy link
Mannequin Author

milseman mannequin commented May 18, 2018

@swift-ci

@milseman
Copy link
Mannequin Author

milseman mannequin commented May 18, 2018

@swift-ci create

@milseman
Copy link
Mannequin Author

milseman mannequin commented Apr 10, 2019

Swift 5 switched to init-time validation.

@swift-ci swift-ci transferred this issue from apple/swift-issues Apr 25, 2022
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
standard library Area: Standard library umbrella
Projects
None yet
Development

No branches or pull requests

0 participants