Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SR-10065] Substring.count is ambiguous inside a [Substring].filter block #52467

Open
swift-ci opened this issue Mar 8, 2019 · 17 comments
Open
Labels
bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. standard library Area: Standard library umbrella type checker Area → compiler: Semantic analysis

Comments

@swift-ci
Copy link
Collaborator

swift-ci commented Mar 8, 2019

Previous ID SR-10065
Radar rdar://problem/48722421
Original Reporter username0x0a (JIRA User)
Type Bug

Attachment: Download

Environment

Swift 5 (swiftlang-1001.0.69)
Xcode 10.2 Beta 4 (10P107d)

Additional Detail from JIRA
Votes 0
Component/s Standard Library
Labels Bug, TypeChecker
Assignee None
Priority Medium

md5: 11953ae492961923b0aff652c736f8f1

Issue Description:

I've created a simple Playground code demonstrating the issue.

Having a string, for example:

"abc def ghi jklmno"

When attempting to call a split–filter–join chain, like:

"abc def ghi jklmno"
    .split(separator: " ")
    .filter({ (substr) -> Bool in
        return substr.count == 3 })
    .joined(separator: ", ")

Swift 5 fails to compile the code, stating use of .count as ambiguous.

Weird behaviours:

• When omitting the .joined operation, the compilation succeeds with no ambiguity about substr.count call, but .filter block incorrectly outputs a result of [[Character]] type ([["a","b","c"], ["d"…]) instead of correct [Substring] type (["abc","def",…]).

• When using an intermediate variable after .split (thus breaking into two parts — let a = "…".split(…); let b = a.filter(…).joined(…)), compilation succeeds with a correct output.

• When touching the substr block variable before calling substr.count (even by putting it on a separate line to only be printed in the variable pane in the Playgrounds), the compilation succeeds with a correct output — thus acts as a work-around.

Steps to Reproduce:
See the attached Playground code.

Version/Build:
Xcode 10.2 Beta 4 (10P107d)
Swift version 5.0 (swiftlang-1001.0.69 clang-1001.0.45.2)

rdar://problem/48722421
https://openradar.appspot.com/48722421

@swift-ci
Copy link
Collaborator Author

swift-ci commented Mar 9, 2019

Comment by Michael Verges (JIRA)

I think the compiler is taking too long implicitly evaluating the types within a long expression.

Explicitly stating the type of substr within the closure is a workaround:

"abc def ghi jklmno"
    .split(separator: " ")
    .filter({ (substr: Substring) -> Bool in
        return substr.count == 3 })
    .joined(separator: ", ")

@swift-ci
Copy link
Collaborator Author

swift-ci commented Mar 9, 2019

Comment by Michal Zelinka (JIRA)

Yeah, that's possibly works around a bit more cleanly. 🙂

Still seems to be a serious issue — by omitting the .joined call, the compilation succeeds with a completely wrong and unexpected result:

"abc def ghi jklmno"
    .split(separator: " ")
    .filter({ (substr) -> Bool in
        return substr.count == 3 })
// Outputs [["a", "b", "c"], ["d", "e", "f"], ["g", "h", "i"]] instead of ["abc","def","ghi"]

Thus I might have no idea something wrong happened. That's neither predictable nor easily detectable unless investigating really carefully. Turning the .filter result instantly into a Codable serialised data could lead to troubles even potentially detected on a remote machine or something (f.e. my backend guy telling me “oi, mate, you're sending me a completely wrong-formatted stuff”). No matter what, pretty huge mistake in a flow one might think should never fail. As Swift is a really-strongly-typed language, I expect issues and ambiguity like this will NEVER happen, no matter how long it will take the compiler to understand the code and data types included.

The other thing is — should we consider this as a “long” expression, if thinking about it at all? Ruby can handle stuff like this in runtime flawlessly.

@swift-ci
Copy link
Collaborator Author

swift-ci commented Mar 9, 2019

Comment by Michal Zelinka (JIRA)

I guess there's even more troubling stuff:

![](Screenshot 2019-03-09 at 22.13.14.png)

  • How am I supposed to split the string by another string, f.e. " - " as I would need in this case? Is this impossible to achieve in Swift 5 easily?

  • How can I be suggested with a single method giving two different return types?

This is madness. There's something terribly wrong with string processing in Swift being a victim of horrible protocol and generics abstractions of {{Collection}}s getting this mess as a result.

@swift-ci
Copy link
Collaborator Author

swift-ci commented Mar 9, 2019

Comment by Michael Verges (JIRA)

Looking through the docs, I found conflicting definitions of split, but you beat me to it in the attached screenshot. This explains why the type of substr in filter is ambiguous. Swift can not implicitly determine the return type of split, therefore the type of substr is ambiguous.

I can not, however, understand why it chooses the [ArraySlice<Character>] in this case:

"abc def ghi jklmno"
    .split(separator: " ")
    .filter({ (substr) -> Bool in
        return substr.count == 3 })
//[["a", "b", "c"], ["d", "e", "f"], ["g", "h", "i"]]

And [Substring] in this case:

var a = "abc def ghi jklmno"
    .split(separator: " ")
a.filter({ (substr) -> Bool in
    return substr.count == 3 })
//["abc", "def", "ghi"]

@swift-ci
Copy link
Collaborator Author

swift-ci commented Mar 9, 2019

Comment by Michal Zelinka (JIRA)

I guess the language design shouldn't allow this kind of ambiguity to even happen. One method definition cannot put out two different results not even in theory, the worse it what happens – it somehow-mostly works, but unpredictibly depending on the actual code.

This bare code:

"abc def ghi jklmno"
    .split(separator: " ")

Should be currently ambigous in any cases and revoke calling .split on String at all so it could be implemented differently in the stdlib. Instead, it compiles and somehow chooses (luckily) the return type I'm looking for. For now.

This code, adding .filter:

"abc def ghi jklmno"
    .split(separator: " ")
    .filter({ (substr) -> Bool in
        return substr.count == 3 })

Works as well. Once again, first mistake – no ambiguity resulting in compilation error, second mistake – filtering an array of {{Substring}}s turns into an array of {{ArraySlice<Character>}}s – switching to the second return type.

Finally, this code, adding .joined:

"abc def ghi jklmno"
    .split(separator: " ")
    .filter({ (substr) -> Bool in
        return substr.count == 3 })
.joined(separator: ", ")

At least fails to compile, even though with a useless error, but I can understand it as it's broken waaay way more beyond this scope.

@swift-ci
Copy link
Collaborator Author

swift-ci commented Mar 9, 2019

Comment by Michael Verges (JIRA)

I agree, this ambiguity should not be allowed. The behavior should be to diagnose an error: "Ambiguous use of 'split'"

This excerpt in playgrounds complains over ambiguity:

func foo(_ a: Character, _ b: Int = 0) -> Int { return b }
func foo(_ a: Character, _ b: Double = 1.0) -> Double { return b }

var c: Character = " "

var i: Int = foo(c)     // Explicit return type allowed
var d: Double = foo(c)  // Explicit return type allowed
var a = foo(c)          // Ambiguous return type
// ERROR: Ambiguous use of 'foo'

For some reason, 'split' is not following this behavior. I think that is the core BUG here.

@swift-ci
Copy link
Collaborator Author

swift-ci commented Mar 9, 2019

Comment by Michal Zelinka (JIRA)

Ambiguous (~duplicate) method definitions (= illegal redeclarations in any meaningful language) shouldn't be allowed at all, no matter if I declare the desired type or not. It can only lead to A) weird and unpredictable behaviour and B) a need for explicit type declaration to work around. Both of something I thought Swift should've eliminated.

@swift-ci
Copy link
Collaborator Author

swift-ci commented Mar 9, 2019

Comment by Michael Verges (JIRA)

This is not a redeclaration, as the return type is different. The method signature can conflict, so it is ambiguous. Disallowing duplicate signatures altogether is beyond the scope of this bug.

On another note, here is a breakdown of what is happening:

"abc def ghi jklmno".split(separator: " ")
    // can return [Substring] or [ArraySlice<Character>]
    .filter({ (substr) -> Bool in
        /*
         'substr' is Substring or ArraySlice<Character>.
         Therefore, type can not be determined.
         
         Explicitly noting 'substr: Substring' or
         'substr: ArraySlice<Character>'will allow the compiler
         to implicitly determine the return type of 'split'.
         */
        // substr is Substring or ArraySlice<Character>
        return substr.count == 3
        /*
         Accesing a method or property that is exclusive to
         Substring or ArraySlice<Character> will allow the compiler
         to implicitly determine the return type of 'split'.
         
         However, both Substring and ArraySlice<Character> have
         property 'count'. Therefore, type can not be determined.
         
         This is the last oportuniy for Swift to determine the
         return type of 'split', so the error is shown that
         'count' is ambiguous.
         */
    }).joined(separator: ", ")

The error makes sense and is expected, but it can be more explicit that 'split' is the culprit, not 'count'. Also, inserting a fixit for explicitly declaring the type.

E.g. something like:
Diagnose: "Ambiguous use of 'split'"
Fixit: "Explicitly declare 'substr: Substring'"
Fixit: "Explicitly declare 'substr: ArraySlice<Character>'"

@swift-ci
Copy link
Collaborator Author

Comment by Michal Zelinka (JIRA)

Yeah, I get it. 🙂 But in case the discussion over some design flaw leads to higher level topic, it could possible pull some strings to reconsider. Allowing conflicting signatures only leads to situations nobody really wants to deal with/expects/can rely on.

In extreme cases, f.e. 3rd-party extensions may duplicate pretty much any possible method from the stdlib, forcing the user to re-type code that's otherwise clear and straight-forward, or alternatively, chaging its behaviour without user's knowledge – that might even come up as a security flaw aspect ('cause nobody really walks through all the 3rd-party code).

Other way, like in this case, a method to explode the string into [ArraySlice<>] should possibly have a completely different name. This bug might describe a similar case with Swift 5 – previously working code requires to be re-typed while previously being completely obvious. I'd really reconsider whether allowing duplicate signatures has some positive aspects with higher value over the lack of clarity, user-unfriendliness and possible security threats.

@swift-ci
Copy link
Collaborator Author

Comment by Michael Verges (JIRA)

The split method that returns [ArraySlice<Element>] is actually a generic function that String inherits from Collection protocol. Keep in mind that renaming this method affects all arrays, not only Strings. And in a lot of cases, the name ‘split’ makes a lot of sense for arrays. A more interesting approach would be to make ArraySlice<Element> implicitly convertible to Substring, and get rid of Strings split method.

@swift-ci
Copy link
Collaborator Author

Comment by Michael Verges (JIRA)

Do you think this discussion should be moved to the forums?

@swift-ci
Copy link
Collaborator Author

Comment by Michal Zelinka (JIRA)

Might be, but this topic would be a bit too late to perform some deeper updates, having the ABI stability in mind, nobody's really into doing some drastic changes. Which is pretty sad, many things done wrong at this moment may stick with us forever, like many nasty things stick with us in C/C++ standards.

I get the inheritance points, but here we hit that aspiration to solve all issues generically. String might be (in theory and even in backing-storage-reality) an array of characters, but processing strings is a special area (no discussion over this in programming) that cannot be easily covered by generic array/collection protocols in full complexity, and while we try to stick {{String}}s on top of {{Collection}}s, we might get some unwanted behaviour and a need to alter it, even an ambiguous way like in this case.

The other thing is – a method returning [ArraySlice] instances should probably be called .slice 🙂 bad naming in this case is troubling as well; .split could be reserved for methods returning an array of same or inherited type instances. That way or another, what's good for arrays might not be useful for strings; this might be a very good example.

Stating Swift as an easy-to-learn language becomes difficult when we start to be limited by too much liberty, related ambiguity, need to specify something otherwise redundant if the rules were more strict and obvious.

@swift-ci
Copy link
Collaborator Author

Comment by Michal Zelinka (JIRA)

Would these proposals be sensible?

  • moving all Collection split functionality to newly introduced slice method returning a [CollectionSlice<>]

  • reserving split for results collecting the same or inherited type arrayed (String[Substring] in this case)

  • allowing String and its descendants use split with separator of String type, not only Character

I think this is pretty mandatory and needed for easy strings manipulation & processing in terms of actual splitting. I guess all possible String-processing scenarios should be considered and analysed carefully as well.

@swift-ci
Copy link
Collaborator Author

Comment by Michael Verges (JIRA)

[1] What advantage would CollectionSlice have over ArraySlice? Seems extraneous to reinvent the wheel
[2] How would you 'reserve' it? I would imagine the ideal solution is to override Collection's split method, but changing the return type won't directly work – unless Substring somehow conforms to ArraySlice<Character>.
[3] I'm sure split(string: String) could be added anyhow without relevance to the underlying bug.

Also, if this really does uproot a lot of code, it could be difficult to approve a PR without a discussion.

@swift-ci
Copy link
Collaborator Author

Comment by Michal Zelinka (JIRA)

Ad 1+2 – no idea about slice types, I tried to describe it theoretically. Collection sliced as possible CollectionSlice on its level of hierarchy, Array as an ArraySlice etc., but that's not the point, let's use WhateverSlice. 🙂

As of now, .split on top of T return both [T] or an WhateverSlice<…>. I'd suggest .slice methods to return WhateverSlice<…> (hence the name 🙂) and .split to return [T] types. Pretty straight-forward, naming-friendly etc. Stdlib could follow these “reserved” cases; 3rd-party code can add extensions and override everything with no rules, but nobody guarantees anything. Having some basic disambiguation in stdlib acts as a nice recommendation.

Ad 3 – how's it possible right now? 🙂 I can't really find an easy way to split the string using another string. Very odd. 😃

@belkadan
Copy link
Contributor

cc @milseman, @lorentey

@swift-ci
Copy link
Collaborator Author

Comment by Michal Zelinka (JIRA)

Further discussion on forums:
https://forums.swift.org/t/string-vs-collection-ambiguity-slice-vs-split/21439/7

@swift-ci swift-ci transferred this issue from apple/swift-issues Apr 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. standard library Area: Standard library umbrella type checker Area → compiler: Semantic analysis
Projects
None yet
Development

No branches or pull requests

2 participants