Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SR-12289] Missed optimization opportunity #54717

Closed
jepers opened this issue Feb 27, 2020 · 4 comments
Closed

[SR-12289] Missed optimization opportunity #54717

jepers opened this issue Feb 27, 2020 · 4 comments

Comments

@jepers
Copy link

jepers commented Feb 27, 2020

Previous ID SR-12289
Radar rdar://problem/59874359
Original Reporter @jepers
Type Improvement
Status Resolved
Resolution Done
Additional Detail from JIRA
Votes 1
Component/s
Labels Improvement
Assignee None
Priority Medium

md5: 44143157d45ffcef42f4d9555e46d4ce

Issue Description:

This function:

@inline(__always)
func globalCellIndexB(forGridIndex i: UInt8) -> UInt32 {
    let results = (UInt32(0), UInt32(65536), UInt32(81920), UInt32(86016),
                   UInt32(87040), UInt32(87296), UInt32(87360))
    return withUnsafeBytes(of: results) {
        $0.bindMemory(to: UInt32.self)[Int(i)]
    }
}

is about 5 times slower than this corresponding C function:

uint32_t globalCellIndexForGridIndexD(uint8_t i) {
    static const uint32_t results[] = {
        0, 65536, 81920, 86016, 87040, 87296, 87360
    };
    return results[i];
}

See detailed discussion in thread of this post:
https://forums.swift.org/t/two-questions-about-pointers-and-performance/34133/10

@atrick
Copy link
Member

atrick commented Feb 27, 2020

This should work around the problem:

private var results = (UInt32(0), UInt32(65536), UInt32(81920), UInt32(86016),
 UInt32(87040), UInt32(87296), UInt32(87360))

public func globalCellIndexB(forGridIndex i: UInt8) -> UInt32 {
 return withUnsafeBytes(of: &results) {
  $0.load(fromByteOffset: Int(i) &* 4, as: UInt32.self)
 }
}

@jckarter, @eeckstein I think the bug here is that the compiler does not optimize away copies of 'let' declarations (either local or global) that are initialized to a tuple of literals. This is basically because 'let's are rvalues and aren't naturally addressable. But I suspect an optimizer pass could identify the copy-from-constant-to-immutable-memory pattern and materialize the constant in read-only static memory instead.

It would be nice if there were a reliable way avoid the tuple copy even at -Onone.

@hborla
Copy link
Member

hborla commented Feb 28, 2020

@swift-ci create

@jckarter
Copy link
Member

@atrick I agree, this seems like a peephole we could add to LoadableByAddress, or its opaque value address lowering successor.

@eeckstein
Copy link
Member

A better approach is to make a regular array lookup fast.
I added a optimization in #36439 which enables to compile

func globalCellIndexB(forGridIndex i: UInt8) -> UInt32 {
    return [UInt32(0), UInt32(65536), UInt32(81920), UInt32(86016),
                   UInt32(87040), UInt32(87296), UInt32(87360)][Int(i)]
}

to a zero-overhead table lookup.
Except the range check is done. But this makes sense - we want to have safety checks.

I don't think we will invest much effort to optimize the original tuple-pattern. I'm resolving this as done because the array-lookup code is the preferred way to go.

@swift-ci swift-ci transferred this issue from apple/swift-issues Apr 25, 2022
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants