Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SR-15381] Incorrect URL parsing #3192

Open
karwa opened this issue Oct 23, 2021 · 0 comments
Open

[SR-15381] Incorrect URL parsing #3192

karwa opened this issue Oct 23, 2021 · 0 comments

Comments

@karwa
Copy link
Contributor

karwa commented Oct 23, 2021

Previous ID SR-15381
Radar None
Original Reporter @karwa
Type Bug
Environment

Xcode 13 GM, macOS 11.6

Additional Detail from JIRA
Votes 0
Component/s Foundation
Labels Bug
Assignee None
Priority Medium

md5: 8014ceb578ffcabb4bed4101e4ee1a11

Issue Description:

let nsurl = URL(string: "h:#ash")!
assert(nsurl.absoluteString == "h:%23ash")

Here, the scheme is "h" and the path is "%23ash". Most other parsers would say that the scheme is "h", the path is empty, and the fragment is "ash".

Compare:

let nsurl = URL(string: "h:?ash")!
assert(nsurl.absolueString == "h:?ash")

The documentation says URL implements RFC-1808, but this would seem to be a point of non-compliance. From the RFC:

URL         = ( absoluteURL | relativeURL ) [ "#" fragment ]

absoluteURL = generic-RL | ( scheme ":" *( uchar | reserved ) )   

generic-RL  = scheme ":" relativeURL   

relativeURL = net_path | abs_path | rel_path   

net_path    = "//" net_loc [ abs_path ]   
abs_path    = "/"  rel_path
rel_path    = [ path ] [ ";" params ] [ "?" query ]

So looking at the string "h:#ash"

  • The URL rule says it must start with an absoluteURL or relativeURL

  • The absoluteURL rules says it may start with generic-RL

  • The generic-RL rule says that we split the string at the first ":", so scheme is "h" and relativeURL is "#ash"

  • relativeURL may be a net_path (starts with "//"), abs_path ("/") or release_path

  • rel_path says everything is optional; it could be an empty string.

  • Therefore, considering the rule in URL which splits at the first "#", the fragment should be "ash"

In fact, the RFC actually recommends the fragment be split before doing any other parsing:

If the parse string contains a crosshatch "#" character, then the
   substring after the first (left-most) crosshatch "#" and up to the
   end of the parse string is the <fragment> identifier.  If the
   crosshatch is the last character, or no crosshatch is present, then
   the fragment identifier is empty.  The matched substring, including
   the crosshatch character, is removed from the parse string before
   continuing.
@swift-ci swift-ci transferred this issue from apple/swift-issues Apr 25, 2022
@shahmishal shahmishal transferred this issue from apple/swift May 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant