New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SR-8996] Inlining crash in devirtualizeWitnessMethod during deabstraction #51499
Comments
Could you email the code (or the link to the code)? Thx. |
Comment by Mingsheng Hong (JIRA) Tweaked the code a bit and got an AD crash. Will talk to Richard and James about it. |
Comment by Mingsheng Hong (JIRA) Based on James' code, I made a simpler reproducer below: import TensorFlow
@inlinable
@differentiable(reverse, wrt: (.0),
primal: _primalSoftmaxCrossEntropy,
adjoint: _adjointSoftmaxCrossEntropy)
func softmaxCrossEntropy(logits: Tensor<Float>, categoricalLabels: Tensor<Int32>) -> Float {
return Raw.sparseSoftmaxCrossEntropyWithLogits(features: logits,
labels: categoricalLabels).loss.mean()
}
@inlinable
func _primalSoftmaxCrossEntropy(logits: Tensor<Float>,
categoricalLabels: Tensor<Int32>) -> (Tensor<Float>, Float) {
let (loss, grad) = Raw.sparseSoftmaxCrossEntropyWithLogits(features: logits,
labels: categoricalLabels)
return (grad, loss.mean())
}
@inlinable
func _adjointSoftmaxCrossEntropy(logits: Tensor<Float>,
categoricalLabels: Tensor<Int32>,
checkpointedGrad: Tensor<Float>,
originalResult: Float,
seed: Float) -> Tensor<Float> {
return checkpointedGrad
}
struct LSTMLanguageModel {
func predict(for input: Tensor<Float>) -> (Tensor<Float>, Tensor<Float>) {
return (input, input)
}
}
func autoregressiveLoss(_ model: LSTMLanguageModel, _ input: Tensor<Float>) -> Float {
let (logits, _) = model.predict(for: input)
return softmaxCrossEntropy(logits: logits, categoricalLabels: Tensor<Int32>(input))
}
func trainLanguageModel(model: LSTMLanguageModel, input: Tensor<Float>) {
_ = #valueAndGradient(autoregressiveLoss, wrt: .0)(model, input)
}
Crash trace: ../build/$rdir/swift-linux-x86_64/bin/swiftc -O -Xllvm -tf-dynamic-compilation test/TensorFlow/lang3.swift
swift: /usr/local/google/home/hongm/ssd_part/git/swift-base/swift/lib/SILOptimizer/Utils/Local.cpp:490: swift::SILValue swift::castValueToABICompatibleType(swift::SILBuilder *, swift::SILLocation, swift::SILValue, swift::SILType, swift::SILType): Assertion `SrcTy.isAddress() == DestTy.isAddress() && "Addresses aren't compatible with values"' failed.
Stack dump:
0. Program arguments: /usr/local/google/home/hongm/ssd_part/git/swift-base/build/Ninja-ReleaseAssert+stdlib-Release/swift-linux-x86_64/bin/swift -frontend -c -primary-file test/TensorFlow/lang3.swift -target x86_64-unknown-linux-gnu -disable-objc-interop -O -Xllvm -tf-dynamic-compilation -module-name lang3 -o /tmp/lang3-e33abc.o
1. While running pass #​126 SILModuleTransform "TFDeabstraction".
2. TFDeabstraction::inlineCalls\0
#​0 0x000000000414cf34 PrintStackTraceSignalHandler(void*) (/usr/local/google/home/hongm/ssd_part/git/swift-base/build/Ninja-ReleaseAssert+stdlib-Release/swift-linux-x86_64/bin/swift+0x414cf34)
#​1 0x000000000414b090 llvm::sys::RunSignalHandlers() (/usr/local/google/home/hongm/ssd_part/git/swift-base/build/Ninja-ReleaseAssert+stdlib-Release/swift-linux-x86_64/bin/swift+0x414b090)
#​2 0x000000000414d0e2 SignalHandler(int) (/usr/local/google/home/hongm/ssd_part/git/swift-base/build/Ninja-ReleaseAssert+stdlib-Release/swift-linux-x86_64/bin/swift+0x414d0e2)
#​3 0x00007fe62ac3a0c0 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x110c0)
#​4 0x00007fe6134aafcf gsignal (/lib/x86_64-linux-gnu/libc.so.6+0x32fcf)
#​5 0x00007fe6134ac3fa abort (/lib/x86_64-linux-gnu/libc.so.6+0x343fa)
#​6 0x00007fe6134a3e37 (/lib/x86_64-linux-gnu/libc.so.6+0x2be37)
#​7 0x00007fe6134a3ee2 (/lib/x86_64-linux-gnu/libc.so.6+0x2bee2)
#​8 0x0000000000ec6515 swift::castValueToABICompatibleType(swift::SILBuilder*, swift::SILLocation, swift::SILValue, swift::SILType, swift::SILType) (/usr/local/google/home/hongm/ssd_part/git/swift-base/build/Ninja-ReleaseAssert+stdlib-Release/swift-linux-x86_64/bin/swift+0xec6515)
#​9 0x0000000000e7513b swift::tryDevirtualizeWitnessMethod(swift::ApplySite, swift::OptRemark::Emitter*) (/usr/local/google/home/hongm/ssd_part/git/swift-base/build/Ninja-ReleaseAssert+stdlib-Release/swift-linux-x86_64/bin/swift+0xe7513b)
#​10 0x0000000000e7582d swift::tryDevirtualizeApply(swift::ApplySite, swift::ClassHierarchyAnalysis*, swift::OptRemark::Emitter*) (/usr/local/google/home/hongm/ssd_part/git/swift-base/build/Ninja-ReleaseAssert+stdlib-Release/swift-linux-x86_64/bin/swift+0xe7582d)
#​11 0x000000000109c465 runOnFunctionRecursively(swift::SILOptFunctionBuilder&, swift::SILFunction*, swift::FullApplySite, llvm::DenseSet<swift::SILFunction*, llvm::DenseMapInfo<swift::SILFunction*> >&, llvm::ImmutableSet<swift::SILFunction*, llvm::ImutContainerInfo<swift::SILFunction*> >::Factory&, llvm::ImmutableSet<swift::SILFunction*, llvm::ImutContainerInfo<swift::SILFunction*> >, swift::ClassHierarchyAnalysis*, swift::SILInliner::InlineKind, std::function<bool (swift::FullApplySite, swift::SILFunction&)> const&) (/usr/local/google/home/hongm/ssd_part/git/swift-base/build/Ninja-ReleaseAssert+stdlib-Release/swift-linux-x86_64/bin/swift+0x109c465)
#​12 0x000000000109caf2 runOnFunctionRecursively(swift::SILOptFunctionBuilder&, swift::SILFunction*, swift::FullApplySite, llvm::DenseSet<swift::SILFunction*, llvm::DenseMapInfo<swift::SILFunction*> >&, llvm::ImmutableSet<swift::SILFunction*, llvm::ImutContainerInfo<swift::SILFunction*> >::Factory&, llvm::ImmutableSet<swift::SILFunction*, llvm::ImutContainerInfo<swift::SILFunction*> >, swift::ClassHierarchyAnalysis*, swift::SILInliner::InlineKind, std::function<bool (swift::FullApplySite, swift::SILFunction&)> const&) (/usr/local/google/home/hongm/ssd_part/git/swift-base/build/Ninja-ReleaseAssert+stdlib-Release/swift-linux-x86_64/bin/swift+0x109caf2)
#​13 0x000000000109c0f0 swift::inlineForTFDeabstraction(swift::SILOptFunctionBuilder&, swift::SILFunction&, std::function<bool (swift::FullApplySite, swift::SILFunction&)> const&) (/usr/local/google/home/hongm/ssd_part/git/swift-base/build/Ninja-ReleaseAssert+stdlib-Release/swift-linux-x86_64/bin/swift+0x109c0f0) |
Thanks. Have you tried removing all AD-related operators? If so, it could even be an AD bug. I'll prioritize it this week. |
Comment by Mingsheng Hong (JIRA) This might be an AD bug, because:
$ ../build/$rdir/swift-linux-x86_64/bin/swiftc -frontend -O -Xllvm -tf-dump-intermediates -Xllvm -tf-dump-graph -Xllvm -tf-dynamic-compilation=true -emit-sil test/TensorFlow/lang3.swift
--- TFDeabstraction Input: AD__$s5lang318autoregressiveLossySfAA17LSTMLanguageModelV_10TensorFlow0F0VySfGtF__grad_src_0_wrt_0_s_p
// AD__$s5lang318autoregressiveLossySfAA17LSTMLanguageModelV_10TensorFlow0F0VySfGtF__grad_src_0_wrt_0_s_p
sil hidden @AD__$s5lang318autoregressiveLossySfAA17LSTMLanguageModelV_10TensorFlow0F0VySfGtF__grad_src_0_wrt_0_s_p : $@convention(thin) (LSTMLanguageModel, @guaranteed Tensor<Float>, Float) -> (Float, LSTMLanguageModel) {
// %0 // users: %8, %4
// %1 // users: %8, %4
// %2 // user: %8
bb0(%0 : $LSTMLanguageModel, %1 : $Tensor<Float>, %2 : $Float):
// function_ref AD__$s5lang318autoregressiveLossySfAA17LSTMLanguageModelV_10TensorFlow0F0VySfGtF__primal_src_0_wrt_0
%3 = function_ref @AD__$s5lang318autoregressiveLossySfAA17LSTMLanguageModelV_10TensorFlow0F0VySfGtF__primal_src_0_wrt_0 : $@convention(thin) (LSTMLanguageModel, @guaranteed Tensor<Float>) -> (@owned AD__$s5lang318autoregressiveLossySfAA17LSTMLanguageModelV_10TensorFlow0F0VySfGtF__Type__src_0_wrt_0, Float) // user: %4
%4 = apply %3(%0, %1) : $@convention(thin) (LSTMLanguageModel, @guaranteed Tensor<Float>) -> (@owned AD__$s5lang318autoregressiveLossySfAA17LSTMLanguageModelV_10TensorFlow0F0VySfGtF__Type__src_0_wrt_0, Float) // users: %6, %5
%5 = tuple_extract %4 : $(AD__$s5lang318autoregressiveLossySfAA17LSTMLanguageModelV_10TensorFlow0F0VySfGtF__Type__src_0_wrt_0, Float), 0 // user: %8
%6 = tuple_extract %4 : $(AD__$s5lang318autoregressiveLossySfAA17LSTMLanguageModelV_10TensorFlow0F0VySfGtF__Type__src_0_wrt_0, Float), 1 // users: %9, %8
// function_ref AD__$s5lang318autoregressiveLossySfAA17LSTMLanguageModelV_10TensorFlow0F0VySfGtF__adjoint_src_0_wrt_0
%7 = function_ref @AD__$s5lang318autoregressiveLossySfAA17LSTMLanguageModelV_10TensorFlow0F0VySfGtF__adjoint_src_0_wrt_0 : $@convention(thin) (LSTMLanguageModel, @guaranteed Tensor<Float>, AD__$s5lang318autoregressiveLossySfAA17LSTMLanguageModelV_10TensorFlow0F0VySfGtF__Type__src_0_wrt_0, Float, Float) -> LSTMLanguageModel // user: %8
%8 = apply %7(%0, %1, %5, %6, %2) : $@convention(thin) (LSTMLanguageModel, @guaranteed Tensor<Float>, AD__$s5lang318autoregressiveLossySfAA17LSTMLanguageModelV_10TensorFlow0F0VySfGtF__Type__src_0_wrt_0, Float, Float) -> LSTMLanguageModel // user: %9
%9 = tuple (%6 : $Float, %8 : $LSTMLanguageModel) // user: %10
return %9 : $(Float, LSTMLanguageModel) // id: %10
} // end sil function 'AD__$s5lang318autoregressiveLossySfAA17LSTMLanguageModelV_10TensorFlow0F0VySfGtF__grad_src_0_wrt_0_s_p' 2. If we replace `#valueAndGradient(autoregressiveLoss, wrt: .0)(model, input)` with `autoregressiveLoss(model, input)` above, it does not crash at the same place. Instead, it crashes at IRGen. I can debug this one once we confirm the generated SIL code above by AD is good. |
Comment by Mingsheng Hong (JIRA) More info from gdb at the point of crash: // AD__$s5lang318autoregressiveLossySfAA17LSTMLanguageModelV_10TensorFlow0F0VySfGtF__adjoint_src_0_wrt_0
sil hidden @AD__$s5lang318autoregressiveLossySfAA17LSTMLanguageModelV_10TensorFlow0F0VySfGtF__adjoint_src_0_wrt_0 : $@convention(thin) (LSTMLanguageModel, @guaranteed Tensor<Float>, AD__$s5lang318autoregressiveLossySfAA17LSTMLanguageModelV_10TensorFlow0F0VySfGtF__Type__src_0_wrt_0, Float, Float) -> LSTMLanguageModel {
// %0 // user: %78
// %1 // users: %29, %13, %78
// %2 // users: %45, %32, %5
// %4 // user: %36
bb0(%0 : $LSTMLanguageModel, %1 : $Tensor<Float>, %2 : $AD__$s5lang318autoregressiveLossySfAA17LSTMLanguageModelV_10TensorFlow0F0VySfGtF__Type__src_0_wrt_0, %3 : $Float, %4 : $Float):
...
%67 = witness_method $Tensor<Float>, #VectorNumeric.init!allocator.1 : <Self where Self : VectorNumeric> (Self.Type) -> (Self.ScalarElement) -> Self : $@convention(witness_method: VectorNumeric) <τ_0_0 where τ_0_0 : VectorNumeric> (@in τ_0_0.ScalarElement, @thick τ_0_0.Type) -> @out τ_0_0 // user: %70
...
%70 = apply %67<Tensor<Float>>(%54, %69, %68) : $@convention(witness_method: VectorNumeric) <τ_0_0 where τ_0_0 : VectorNumeric> (@in τ_0_0.ScalarElement, @thick τ_0_0.Type) -> @out τ_0_0
When processing %70 in the last line of swift::tryDevirtualizeWitnessMethod(), the crash was triggered. I'll leave this with AD folks for now. |
Thanks a lot for isolating the bug. I'll take this. |
Comment by Mingsheng Hong (JIRA) np. Let me know if I can help. |
Additional Detail from JIRA
md5: 234cf7049009ab142d351ab073e27b3e
Issue Description:
The LSTM language model example crashes the compiler during the deabstraction pass towards the end of inlining.
Stacktrace:
Additional LLDB output that may be useful:
I can provide an internal link to the example, or I can paste it publicly if that's better.
The text was updated successfully, but these errors were encountered: