Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SR-7511] Relatively simple loop shows unexpected O(n^2) behavior #50053

Open
swift-ci opened this issue Apr 24, 2018 · 10 comments
Open

[SR-7511] Relatively simple loop shows unexpected O(n^2) behavior #50053

swift-ci opened this issue Apr 24, 2018 · 10 comments
Labels
bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. standard library Area: Standard library umbrella

Comments

@swift-ci
Copy link
Collaborator

Previous ID SR-7511
Radar None
Original Reporter iosdevzone (JIRA User)
Type Bug

Attachment: Download

Environment

macOS 10.13.4 Beta (17E170c)

Apple Swift version 4.1 (swiftlang-902.0.48 clang-902.0.37.1)

Target: x86_64-apple-darwin17.5.0

Additional Detail from JIRA
Votes 0
Component/s Standard Library
Labels Bug
Assignee None
Priority Medium

md5: 7c118b707446ed86d1f5837100aaecb1

Issue Description:

On macOS (I don't have a linux box handy right now) this rather simple looking loop demonstrates the problem:

public func countNewlines3(string s: String) -> Int {
{{ let newline = Character("\n")}}
{{ var n = 0}}
{{ for i in 0..<s.count where s[s.index(s.startIndex, offsetBy: i)] == newline {}}
{{ n = n + 1}}
{{ }}}
{{ return n}}
{{}}}

For any large string this soon starts to take seconds to run. The attached file contains a runnable demonstration and also a alternate implementation the demonstrates the expected O( n ) performance.

@swift-ci
Copy link
Collaborator Author

Comment by iOSDevZone (JIRA)

OK, I think I've figured out what is going on, so maybe this is intended behavior. It is because

s.index(s.startIndex, offsetBy: i)

is O( i ). This does make random access into large strings quite slow without taking mitigating action.

Perhaps the standard library should cache some indices?

@belkadan
Copy link
Contributor

belkadan commented May 1, 2018

This is why you should loop over s.indices instead of starting at the beginning each time. Or just loop over the Characters in the String instead.

Anything to add, @milseman?

@milseman
Copy link
Mannequin

milseman mannequin commented May 1, 2018

The code is O(n^2) as written. This is why we don't provide an Int-based subscript directly, though we could provide clearer reasoning behind why something like index(_:offsetBy:)
takes O(n) time.

Jordan's suggestion:

for idx in s.indices {
  ... s[idx] ...
}

is the better pattern and is much more ergonomic anyways.

@swift-ci
Copy link
Collaborator Author

swift-ci commented May 1, 2018

Comment by iOSDevZone (JIRA)

As I was writing my first comment on this, it did occur to me that that is probably why direct integer indexing is not provided! Somehow, I also managed to overlook `s.indices`; that is a much neater solution and is what I trying to get at in my comment about caching indices.

So it seems there are adequate, existing ways to avoid the O(n^2) behavior, and thank you both for taking time to respond educate me on better approaches.

As far as I am concerned, this closes this issue for me. (Problem was located between chair and keyboard!)

Perhaps worth mentioning, the performance, even using `s.indices`, compared to (1) NSString in Swift (2) unsafe pointers in Swift (3) NSString in Objective-C (4) std::string in C++, is got great, but that's a subject for another report.

@milseman
Copy link
Mannequin

milseman mannequin commented May 1, 2018

Are you comparing grapheme iterators to code units iterators? If so, they're algorithmically different entities.

(FWIW, String's UTF16View also has about 25-50% overhead due to un-paired surrogate detection, so that's not great either)

@swift-ci
Copy link
Collaborator Author

swift-ci commented May 2, 2018

Comment by iOSDevZone (JIRA)

@milseman no, I am not comparing dissimilar iterators. The performance issues I mentioned occur when working in the UTF-16 codepoint world (apologies, I could have been clearer). At some levels I can ignore un-paired surrogate pairs (e.g. sentence or line segmentation) and leave higher level processing to detect them, in others I am detected them, so it may not be an entirely level playing field, but the differences are not adequately explained by this. Happy to provide more information and code examples, but did not want to muddy the comments section with unrelated issues.

@milseman
Copy link
Mannequin

milseman mannequin commented May 2, 2018

Please do, we're also always on the lookout for more benchmarks.

The slowdown I was alluding to is UTF16View.subscript, which checks the neighbors for an un-paired surrogate so it can vend 0xFFFD. Personally, I think this is the wrong model and we need to move to explicit control upon initialization and emergent behavior upon violation (I believe Rust eventually moved to this model too).

@swift-ci
Copy link
Collaborator Author

swift-ci commented May 2, 2018

Comment by iOSDevZone (JIRA)

What's the appropriate way to submit such benchmarks/code? Just open an issue here?

@milseman
Copy link
Mannequin

milseman mannequin commented May 2, 2018

The best option is to open a PR against the in-tree benchmark suite: https://github.com/apple/swift/tree/master/benchmark. Ideally it would build and be properly integrated. But, if you need help with that you can just open a PR with the benchmark code in it (even if it doesn't build) and I can adapt it for the suite.

@swift-ci
Copy link
Collaborator Author

swift-ci commented May 2, 2018

Comment by iOSDevZone (JIRA)

OK, I'll look into it. I may run into time constraint issues, in which case I may just hand off an Xcode test suite but I'll try to give you something that works "out of the box". Thanks for your assistance and advice.

@swift-ci swift-ci transferred this issue from apple/swift-issues Apr 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. standard library Area: Standard library umbrella
Projects
None yet
Development

No branches or pull requests

2 participants