Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SR-8589] [blocker] model.updateParameters in the iris tutorial crashes the compiler #51107

Closed
marcrasi mannequin opened this issue Aug 21, 2018 · 16 comments
Closed

[SR-8589] [blocker] model.updateParameters in the iris tutorial crashes the compiler #51107

marcrasi mannequin opened this issue Aug 21, 2018 · 16 comments
Labels
bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. s4tf iris tutorial swift for tensorflow

Comments

@marcrasi
Copy link
Mannequin

marcrasi mannequin commented Aug 21, 2018

Previous ID SR-8589
Radar None
Original Reporter @marcrasi
Type Bug
Status Resolved
Resolution Done
Additional Detail from JIRA
Votes 0
Component/s Swift for TensorFlow
Labels Bug, s4tf-iris-tutorial
Assignee bgogul (JIRA)
Priority Medium

md5: cffdda1ec2c8949bd9538c7f7ce2d691

Issue Description:

Execute the following two cells in Jupyter (or paste them into an LLDB REPL):

import TensorFlow
struct Model : Parameterized {
    @TFParameter var x: Tensor<Float> = Tensor(zeros: [10, 10])
}
var model = Model()
model.updateParameters(withGradients: Model.Parameters(x: Tensor(1))) { $0 -= 0.1 * $1 }

You should get this error:

!!! Compiler bug -- Tensor op builtin __tfop_tfc.scalarToTensor,$in cannot be lowered to LLVM IR !!!

If you do everything in one cell, or if you do everything in one swift script, then you won't get the error!

@rxwei
Copy link
Member

rxwei commented Aug 21, 2018

Do you have a minimal reproducer?

@marcrasi
Copy link
Mannequin Author

marcrasi mannequin commented Aug 21, 2018

I created one and updated the description. It seems to be some interaction with LLDB because I can't get it to cause an error in a plain swift script.

@rxwei
Copy link
Member

rxwei commented Aug 21, 2018

The problem does not occur in compiler mode or script interpreter mode. This only occurs in REPL.

@rxwei
Copy link
Member

rxwei commented Aug 21, 2018

Same with

model.allParameters.update(withGradients: Model.Parameters(x: Tensor(1))) { $0 -= 0.1 * $1 }

@rxwei
Copy link
Member

rxwei commented Aug 21, 2018

This bug clearly has the highest priority because parameter aggregate is a strict requirement. bgogul (JIRA User) Would you like to take a look? I can provide further support on this.

The first step is to figure out why the trailing closure

{ $0 -= 0.1 * $1 }

... is not getting partitioned.

One quick simple hack is to make the compiler partition even inlinable functions as long as they are not generic. This change can be added at TFUtilities.cpp:662.

@swift-ci
Copy link
Collaborator

Comment by Gogul Balakrishnan (JIRA)

It looks like we have a workaround for the time being. I will take a look once I wrap up the other bugs that are blocking the tutorial.

@rxwei
Copy link
Member

rxwei commented Aug 21, 2018

There's no workaround yet.

If you do everything in one cell, or if you do everything in one swift script, then you won't get the error!

Putting everyone in one cell would make the tutorial notebook no longer a notebook.

@lattner
Copy link
Mannequin

lattner mannequin commented Aug 22, 2018

Does this work if you hard code "-tf-promote-global-variables" to enabled?

@swift-ci
Copy link
Collaborator

Comment by Gogul Balakrishnan (JIRA)

@lattner, enabling tf-promote-global-variables did not fix the issue. I will dig around a bit more.

@swift-ci
Copy link
Collaborator

Comment by Gogul Balakrishnan (JIRA)

Just an update based on my investigation. In the repl mode, the compiler fails to inline the call to updateParameters:

model.updateParameters(withGradients: Model.Parameters(x: Tensor(1))) { $0 -= 0.1 * $1 }

The reason is that the attempt to link the updateParameters fails. The relevant condition in the shouldInlinePredicate is the following:

if (callee.empty() && !site.getModule().linkFunction(&callee, SILModule::LinkingMode::LinkAll)  

"site.getModule().linkFunction(...)" returns false for updateParameters in repl mode.

@rxwei
Copy link
Member

rxwei commented Aug 23, 2018

How about the following?

model.allParameters.update(withGradients: Model.Parameters(x: Tensor(1))) { $0 -= 0.1 * $1 }

Is this causing the same error?

This case above should be tested first. `updateParameters` is just a Swift-level wrapper that might obscure what's going on.

@swift-ci
Copy link
Collaborator

Comment by Gogul Balakrishnan (JIRA)

Yes, changing it to allParameters.update(..) is causing the same error.

I think I understand the issue now. Basically, the SIL IR is not retained after evaluating a line in repl mode. Only the lowered LLVM IR persists. That is why the deabstraction phase is not able to link the SILFunction in.

@rxwei
Copy link
Member

rxwei commented Aug 24, 2018

I don't think that's the case, here's an example:

  2> func foo<T>(x: Tensor<T>) -> Tensor<T> {
  3.     return x
  4. } // I hit enter here
  5> foo(x: Tensor(1))
2018-08-23 16:33:19.642759: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.2 AVX AVX2 FMA
$R0: TensorFlow.Tensor<Double> = 1.0

This means the SIL is being preserved. Compiler cannot partition generic functions.

@swift-ci
Copy link
Collaborator

Comment by Gogul Balakrishnan (JIRA)

I did a bit more digging around. I don't completely understand how REPL is implemented. However, I can see that the SILModule is thrown away after generating the LLVM IR. For instance, SwiftExpressionParser.cpp:1887 calls swift:: PerformIRGeneration(), which ultimately releases the SILModule at IRGen.cpp:854.

If this is the case, we need to have a way of preserving the relevant SIL code throughout the REPL session.

@swift-ci
Copy link
Collaborator

Comment by Gogul Balakrishnan (JIRA)

I sent out a hacky patch-19502 to deal with this issue for the time being.

@rxwei, could it be the case that the generic case is working because foo has not #tfops, and hence, no deabstraction/partitioning is needed? Here is the SIL that was generated for foo in lldb:

// Foo<A>(x:)
sil @$S13__lldb_expr_33Foo1x10TensorFlow0D0VyxGAG_tAD013AccelerableBydE0RzlF : $@convention(thin) <T where T : AccelerableByTensorFlow> (@guaranteed Tensor<T>) -> @owned Tensor<T> {
// %0                                             // users: %2, %1
bb0(%0 : @guaranteed $Tensor<T>):
  debug_value %0 : $Tensor<T>, let, name "x", argno 1 // id: %1
  %2 = copy_value %0 : $Tensor<T>                 // user: %3
  return %2 : $Tensor<T>                          // id: %3
} // end sil function '$S13__lldb_expr_33Foo1x10TensorFlow0D0VyxGAG_tAD013AccelerableBydE0RzlF' 

@rxwei
Copy link
Member

rxwei commented Aug 30, 2018

Function not having tensor is a good point. I intended to add some tensor code there but forgot...

The following case is already being diagnosed, which is great.

  4> func bar<T : Numeric>(x: Tensor<T>) -> Tensor<T> { return x + x }
error: repl.swift:4:6: error: internal error: TensorFlow graph program extraction does not work on generic functions yet
func bar<T : Numeric>(x: Tensor<T>) -> Tensor<T> { return x + x }
     ^

And the following case fails as expected:

  4> @inlinable func bar<T : Numeric>(x: Tensor<T>) -> Tensor<T> { return x + x }
  5> bar(x: Tensor(1))

Your patch makes sense and is a nice hack to unblock progress!

@swift-ci swift-ci transferred this issue from apple/swift-issues Apr 25, 2022
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. s4tf iris tutorial swift for tensorflow
Projects
None yet
Development

No branches or pull requests

2 participants