Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Medium
    • Resolution: Done
    • Component/s: Standard Library
    • Labels:
      None

      Description

      String should adopt a validity model more akin to [Rust's](https://doc.rust-lang.org/std/string/struct.String.html), where validity is checked upon init and assumed during processing.

      This is meant to track the effort.

      From [my comment on SR-7602](https://bugs.swift.org/browse/SR-7602?focusedCommentId=35396&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-35396):

      What is the validation story? If the stream of bytes contains invalid UTF-8, do we want:
      1) The initializer to fail resulting in nil
      2) The initializer to fail producing an error
      3) The invalid bytes to be replaced with U+FFFD
      4) The bytes verbatim, and experience the emergent behavior / unspecified results / security hazard from those bytes.

      For reference, I think [Rust's model](https://doc.rust-lang.org/std/string/struct.String.html) is pretty good:

      `from_utf8` produces an error explaining why the code units were invalid
      `from_utf8_lossy` replaces encoding errors with U+FFFD
      `from_utf8_unchecked` which takes the bytes, but if there's an encoding error, then memory safety has been violated

      I'm not entirely sure if accepting invalid bytes requires voiding memory safety (assuming bounds checking always happens), but it is totally a security hazard if used improperly. We may want to be very cautious about if/how we expose it.

      I think that trying to do read-time validation is dubious for UTF-16, and totally bananas for UTF-8.

        Attachments

          Activity

            People

            Assignee:
            milseman Michael Ilseman
            Reporter:
            milseman Michael Ilseman
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: