[SR-7725] [String] New validity model #50265

milseman · 2018-05-18T18:25:46Z


Previous ID	SR-7725
Radar	rdar://problem/40372743
Original Reporter	@milseman
Type	Sub-task
Status	Closed
Resolution	Done

Additional Detail from JIRA


Votes	1
Component/s	Standard Library
Labels	Sub-task
Assignee	@milseman
Priority	Medium

md5: 7994f642ab71fb356af656d9218b289c

Parent-Task:

SR-7602 UTF8 should be (one of) the fastest String encoding(s)

Issue Description:

String should adopt a validity model more akin to [Rust's](https://doc.rust-lang.org/std/string/struct.String.html), where validity is checked upon init and assumed during processing.

This is meant to track the effort.

From [my comment on SR-7602](https://bugs.swift.org/browse/SR-7602?focusedCommentId=35396&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-35396):

What is the validation story? If the stream of bytes contains invalid UTF-8, do we want:

The initializer to fail resulting in nil
The initializer to fail producing an error
The invalid bytes to be replaced with U+FFFD
The bytes verbatim, and experience the emergent behavior / unspecified results / security hazard from those bytes.

For reference, I think [Rust's model](https://doc.rust-lang.org/std/string/struct.String.html) is pretty good:

`from_utf8` produces an error explaining why the code units were invalid
`from_utf8_lossy` replaces encoding errors with U+FFFD
`from_utf8_unchecked` which takes the bytes, but if there's an encoding error, then memory safety has been violated

I'm not entirely sure if accepting invalid bytes requires voiding memory safety (assuming bounds checking always happens), but it is totally a security hazard if used improperly. We may want to be very cautious about if/how we expose it.

I think that trying to do read-time validation is dubious for UTF-16, and totally bananas for UTF-8.

milseman · 2018-05-18T18:25:57Z

@swift-ci

milseman · 2018-05-18T19:47:17Z

@swift-ci create

milseman · 2019-04-10T23:22:44Z

Swift 5 switched to init-time validation.

swift-ci transferred this issue from apple/swift-issues Apr 25, 2022

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SR-7725] [String] New validity model #50265

[SR-7725] [String] New validity model #50265

milseman mannequin commented May 18, 2018

milseman mannequin commented May 18, 2018

milseman mannequin commented May 18, 2018

milseman mannequin commented Apr 10, 2019

[SR-7725] [String] New validity model #50265

[SR-7725] [String] New validity model #50265

Comments

milseman mannequin commented May 18, 2018

milseman mannequin commented May 18, 2018

milseman mannequin commented May 18, 2018

milseman mannequin commented Apr 10, 2019