Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SR-11499] Bounds checks inhibiting vectorisation of array operations. #53900

Open
Lukasa opened this issue Sep 21, 2019 · 0 comments
Open

[SR-11499] Bounds checks inhibiting vectorisation of array operations. #53900

Lukasa opened this issue Sep 21, 2019 · 0 comments
Labels
bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. compiler The Swift compiler in itself performance

Comments

@Lukasa
Copy link
Contributor

Lukasa commented Sep 21, 2019

Previous ID SR-11499
Radar None
Original Reporter @Lukasa
Type Bug
Environment

Apple Swift version 5.1.1 (swiftlang-1100.0.273 clang-1100.0.33.9)
Target: x86_64-apple-darwin19.0.0

Additional Detail from JIRA
Votes 0
Component/s Compiler
Labels Bug, Performance
Assignee None
Priority Medium

md5: 845b8aa38251f79df6cb70314a1ed141

Issue Description:

The following "benchmark" program shows a simple constant-time (ish) comparison function implemented on top of [UInt8]:

import Foundation

@inline(never)
func constantTimeCompare(_ lhs: [UInt8], _ rhs: [UInt8]) -> Bool {
    guard lhs.count == rhs.count else {
        return false
    }

    return 0 == zip(lhs, rhs).reduce(into: 0) { $0 |= $1.0 ^ $1.1 }
}

let first: [UInt8] = [1, 2, 3, 4, 5]
let second: [UInt8] = [1, 2, 3, 4, 5]

if constantTimeCompare(first, second) {
    exit(0)
} else {
    exit(1)
}

When compiled with -O we obtain the following assembly for constantTimeCompare:

    .private_extern test.constantTimeCompare([Swift.UInt8], [Swift.UInt8]) -> Swift.Bool
    .globl  test.constantTimeCompare([Swift.UInt8], [Swift.UInt8]) -> Swift.Bool
    .p2align    4, 0x90
test.constantTimeCompare([Swift.UInt8], [Swift.UInt8]) -> Swift.Bool:
    pushq   %rbp
    movq    %rsp, %rbp
    movq    16(%rdi), %r8
    cmpq    16(%rsi), %r8
    jne LBB1_1
    testq   %r8, %r8
    je  LBB1_3
    xorl    %ecx, %ecx
    xorl    %edx, %edx
    .p2align    4, 0x90
LBB1_9:
    cmpq    %r8, %rdx
    jae LBB1_12
    cmpq    %rdx, %r8
    je  LBB1_11
    cmpq    %r8, %rdx
    jae LBB1_13
    movzbl  32(%rdi,%rdx), %eax
    xorb    32(%rsi,%rdx), %al
    incq    %rdx
    orb %cl, %al
    movl    %eax, %ecx
    cmpq    %rdx, %r8
    jne LBB1_9
    jmp LBB1_6
LBB1_1:
    xorl    %eax, %eax
    popq    %rbp
    retq
LBB1_3:
    xorl    %eax, %eax
    jmp LBB1_6
LBB1_11:
    movl    %ecx, %eax
LBB1_6:
    testb   %al, %al
    sete    %al
    popq    %rbp
    retq
LBB1_12:
    ## InlineAsm Start
    ## InlineAsm End
    ud2
LBB1_13:
    ## InlineAsm Start
    ## InlineAsm End
    ud2

The central part of the loop uses xorb and orb, performing the comparison one byte at a time. An equivalent C function would be vectorised and use pxor for the body of the loop.

It seems likely that the bounds checks are what is inhibiting the vectorisation optimisation here. I came to this conclusion by rewriting the loop to avoid using zip or reduce, and while that new loop has drastically smaller code size it still fails to vectorise.

I am not sure whether this even can be fixed in the case of a general Collection, as it's not trivially clear to me that the bounds check can be hoisted, but in the case of certain kinds of Collection object (those with linear integral indices) it would seem to be possible to perform a transformation that allows vectorisation of this code.

@swift-ci swift-ci transferred this issue from apple/swift-issues Apr 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. compiler The Swift compiler in itself performance
Projects
None yet
Development

No branches or pull requests

1 participant