You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
`from_utf8` produces an error explaining why the code units were invalid
`from_utf8_lossy` replaces encoding errors with U+FFFD
`from_utf8_unchecked` which takes the bytes, but if there's an encoding error, then memory safety has been violated
I'm not entirely sure if accepting invalid bytes requires voiding memory safety (assuming bounds checking always happens), but it is totally a security hazard if used improperly. We may want to be very cautious about if/how we expose it.
I think that trying to do read-time validation is dubious for UTF-16, and totally bananas for UTF-8.
The text was updated successfully, but these errors were encountered:
Additional Detail from JIRA
md5: 7994f642ab71fb356af656d9218b289c
Parent-Task:
Issue Description:
String should adopt a validity model more akin to [Rust's](https://doc.rust-lang.org/std/string/struct.String.html), where validity is checked upon init and assumed during processing.
This is meant to track the effort.
From [my comment on SR-7602](https://bugs.swift.org/browse/SR-7602?focusedCommentId=35396&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-35396):
What is the validation story? If the stream of bytes contains invalid UTF-8, do we want:
For reference, I think [Rust's model](https://doc.rust-lang.org/std/string/struct.String.html) is pretty good:
`from_utf8` produces an error explaining why the code units were invalid
`from_utf8_lossy` replaces encoding errors with U+FFFD
`from_utf8_unchecked` which takes the bytes, but if there's an encoding error, then memory safety has been violated
I'm not entirely sure if accepting invalid bytes requires voiding memory safety (assuming bounds checking always happens), but it is totally a security hazard if used improperly. We may want to be very cautious about if/how we expose it.
I think that trying to do read-time validation is dubious for UTF-16, and totally bananas for UTF-8.
The text was updated successfully, but these errors were encountered: