Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SR-11147] Add withUnsafeElements<T: HomogeneousAggregate, Result>(of:) #53543

Open
atrick opened this issue Jul 16, 2019 · 21 comments
Open

[SR-11147] Add withUnsafeElements<T: HomogeneousAggregate, Result>(of:) #53543

atrick opened this issue Jul 16, 2019 · 21 comments
Labels
compiler The Swift compiler in itself feature A feature request or implementation standard library Area: Standard library umbrella swift evolution proposal needed Flag → feature: A feature that warrants a Swift evolution proposal

Comments

@atrick
Copy link
Member

atrick commented Jul 16, 2019

Previous ID SR-11147
Radar rdar://problem/53160082
Original Reporter @atrick
Type New Feature
Additional Detail from JIRA
Votes 2
Component/s Standard Library
Labels New Feature, LanguageFeatureRequest
Assignee None
Priority Medium

md5: 7c8952a07b1ac467c740324f65861850

is duplicated by:

Issue Description:

Provide the API and compiler features necessary to convert any struct or tuple with elements of common type into an UnsafeBufferPointer.

To do this well, it depends on language features; in particular, protocol conformance for tuples.

It's common to pass a Swift tuple into a C API taking a pointer to the elements. For example, it should be simple and straightforward to import a statically sized C array (as a tuple) and pass it back off to a C api that takes a pointer to the element type. This situation also arises for homogenous structs. Since Swift already guaratees the layout of such structs, we should formalize that guarantee in the new API along with tuple types.

Without this convenience, programmers are forced into misusing the memory binding APIs and potentially introducing undefined behavior. Here real-world code reduced to a trivial example:

// void c_api_takes_fields(int *fields, unsigned int count)
func c_api_takes_fields(_ fields: UnsafeMutablePointer<Int>, _ count: UInt32) { /*...*/ }

struct SomeFields {
  var a: Int
  var b: Int
}

var fields = SomeFields(a: 0, b: 0)
withUnsafeMutableBytes(of: &fields) {
  let buffer = $0.bindMemory(to: Int.self)
  c_api_takes_fields(buffer.baseAddress!, UInt32(buffer.count))
}

This code is bad because:

1. It's an abuse of bindMemory, which should only be called to change the bound memory type in order to reuse a chunk of memory as a different type (e.g. to write a custom allocator in Swift).

2. Taken at face value, it appears to have undefined behavior because it binds the memory of a local variable to a different type. The only reason that is safe in practice is that the call to bindMemory has no actual effect on the memory type because all fields of the struct happen to be Ints. If the bindMemory call actually had any semantic effect, then the local variable would be in an undefined state after the call. There's no enforcement on correct usage and it's misleading to someone else reading the code.

3. The code makes use of an UsafeRawPointer even though it wants all pointer types to be strongly typed. It shouldn't need a raw pointer because it's not working at the level of bytes or reinterpreting values of unrelated types.

Instead, the code should be written as:

// void c_api_takes_fields(int *fields, unsigned int count)
func c_api_takes_fields(_ fields: UnsafeMutablePointer<Int>, _ count: UInt32) { /*...*/ }

struct SomeFields {
  var a: Int
  var b: Int
}

var fields = SomeFields(a: 0, b: 0)
withUnsafeMutableBufferPointer(of: &fields) {
  c_api_takes_fields($0.baseAddress!, UInt32($0.count))

Going back to the common case of importing C arrays, it could now be written as:

const char myCArray[] = {0, 1, 2, 3};

// void c_api_takes_array(const char *array, unsigned int count)
func c_api_takes_array(array: UnsafeMutablePointer<CChar>, _ count: UInt32) { /*...*/ }

withUnsafeBufferPointer(to: myCArray) {
  c_api_takes_array(array: $0, _ count: UInt32($0.count)) {
}

Another real-world example:

var bytes: (CChar, CChar, CChar, CChar) = (0x61, 0x62, 0x63, 0)
let name: String = withUnsafePointer(to: &bytes) { ptr -> String in
  return String(validatingUTF8: UnsafeRawPointer(ptr).assumingMemoryBound(to: CChar.self))!
}

Could be written as:

var bytes: (CChar, CChar, CChar, CChar) = (0x61, 0x62, 0x63, 0)
let name: String = withUnsafeBufferPointer(to: bytes) { buffer in
  return String(validatingUTF8: buffer.baseAddress!)
}

This is a small but important change because

  • it makes a simple, common task, much more discoverable and obvious

  • it avoids the extremely dangerous low-level `assumingMemoryBound` API, which should only be used by experts in the Swift pointer aliasing rules.

Proposed Fix

1. Add a new layout constraint and (builtin) protocol, HomogenousAggregate, to the type system that effectively has an associated type as shown below. The compiler will require that all conforming types are homogenous tuples or structs:

protocol Builtin.HomogenousAggregate {
  associatedtype Element
  var numElements: Int { get }
}

public typealias HomogenousAggregate = Builtin.HomogenousAggregate

Synthesize conformances for any type that needs to meet the layout constraint.

2. Add a compiler Builtin that generates a buffer pointer for any type conforming to the HomogenousAggregate layout constraint:

Builtin.getBufferPointer<T: Builtin.HomogeneousAggregate>(_ value: T)
  -> UnsafeBufferPointer<T.Element>

3. Add two free-standing functions to the standard library:

@inlinable
func withUnsafeBufferPointer<T: HomogeneousAggregate, Result>(to value: T,
  _ body: (UnsafeBufferPointer<T.Element>) throws -> Result)
rethrows -> Result {
  return try body(Builtin.getBufferPointer(value))
}

@inlinable
func withUnsafeMutableBufferPointer<T: HomogeneousAggregate, Result>(to value: inout T,
  _ body: (UnsafeMutableBufferPointer<T.Element>) throws -> Result)
rethrows -> Result {
  return try body(Builtin.getBufferPointer(value))
}

PREREQUISITES

Language support for Tuple types conforming to a builtin protocol. Tuple conformance in general is part of the generics manifesto, so it's something we'd like to support.

Synthesizing conformance to layout constraint protocols. How can it be done lazily without excessive overhead? Many types will potentially conform, but I don't expect the constraint to be used often.

@atrick
Copy link
Member Author

atrick commented Jul 16, 2019

@swift-ci create

@atrick
Copy link
Member Author

atrick commented Jul 16, 2019

I was a little too aggressive assuming layout rules. We don't actually want to guarantee that Swift structs are layout compatible with tuples even if they are homogenous. This will only apply to structs imported from C (because they must follow ABI rules anyway), or Swift structs that are explicitly annotated with a new attribute, like @_layoutCompatible.

@atrick
Copy link
Member Author

atrick commented Jul 16, 2019

Eventually, we may want tuples to automatically conform to Collection. In that case, rather than defining a new Builtin.HomogenousAggregate protocol, we could use HasContiguousStorage protocol shared by other Collections.

@jckarter
Copy link
Member

Is the specific problem we're trying to solve here the usability of imported C array fields in structs? If so, I think we might want to start with how these fields are imported from C to begin with (aside from the "wait until we have real fixed size arrays" option). The homogeneous tuple thing was a hack, and though we'll have to keep the hack around to some degree for source compatibility, maybe we can also provide a better representation for new code to opt in to, since the only ABI we have to worry about is the C ABI. Using accessor coroutines, and the unowned-reference-that-becomes-strong-on-copy trick we'd discussed with @rjmccall previously wrt in-place slice mutations, could we efficiently import C array fields as `ArraySlice<Element>` fields into Swift (with an assertion that mutation doesn't change their size, of course)?

@atrick
Copy link
Member Author

atrick commented Jul 17, 2019

Imported statically sized C arrays are by far the most important and compelling use case. So if we want to really improve usability that's a good one to focus on. However, we still need a fix for the other use cases, like regular Swift tuples and layout-constrained structs. The only alternative today is to abuse memory binding APIs; a practice that we need to put an end to. HasContiguousStorage conformance is the simplest, most direct way to express this. As @airspeedswift points out, this is independent of whether we have a Collection. We also want the conformance for SIMD types.

@jckarter
Copy link
Member

It isn't clear to me when you'd want to use a tuple or struct this way (aside from the SIMD structs, maybe, though @stephentyrone made a point of not making those collections too) in code that didn't already have to deal with the weird clang importer tuples.

@stephentyrone
Copy link
Member

This is pretty similar to the API that I've been converging on for native static array types as implementation details:

  func withUnsafeBufferPointer<R>(
    _ body: (UnsafeBufferPointer<Element>) throws -> R
  ) rethrows -> R
  
  mutating func withUnsafeMutableBufferPointer<R>(
    _ body: (UnsafeMutableBufferPointer<Element>) throws -> R
  ) rethrows -> R

It's pretty useful for working with fixed-size objects (like the words storage for a fixed-size integer type, or a small fixed-size matrix, etc). In cases like these, we don't want the type itself to be a collection, but being able to get a buffer pointer is one of the key use cases.

@atrick
Copy link
Member Author

atrick commented Jul 17, 2019

This proposed API applies to any value that you want to pass to a C function taking a pointer to that value's element type. So, this definitely comes up with tuples, regardless of whether they are defined in C or Swift (that fact that we can't "round trip" a C API is just particularly egregious). It also comes up with structs where the C API assumes layout (based on ABI constraints). For example, we see this in real code using Apple's task_policy.h API. The reduced example is shown above in c_api_takes_fields.

Again, this isn't just about handling common use cases so much as handling all reasonable corner cases while guarding against undefined behavior.

@atrick
Copy link
Member Author

atrick commented Jul 17, 2019

To be clear, I'm strongly in favor of making withUnsafeBufferPointer a member of HasContiguousStorage rather than as a stand-alone function that I proposed above. I only proposed that because I wasn't sure we wanted to generate witness tables and synthesize witnesses for all the conformances.

@jckarter
Copy link
Member

HomogeneousAggregate seems like it should be treated as a layout constraint, like AnyObject or the currently-SIL-only ones used by the specializer like _Trivial. That would naturally mean that it doesn't have witness tables, since the "witness" of a type's conformance is its physical layout. Although we don't allow extensions to add members to AnyObject, that's an artificial restriction. We could make HomogeneousAggregate extensible, and that could let you expose the API you want in terms of some Builtins that operate magically on the layout constraint.

It'd also be straightforward to have default implementations of Collection where Self: HomogeneousAggregate in an extension if we so desired.

@atrick
Copy link
Member Author

atrick commented Jul 17, 2019

@jckarter yeah, that's what I was envisioning to avoid the runtime overhead. I proposed adding a Builtin protocol because AFAICT it's the only way to use it in type position. i.e. this doesn't work foo<T: _Trivial>.

The only thing that makes me sad is that we wouldn't be able to treat all our UnsafeBufferPointer-like things uniformly in generic context. i.e. this doesn't work extension HomogenousAggregate : HasContiguousStorage

@jckarter
Copy link
Member

The fact that `_Trivial` doesn't work in type position is also, in a sense, an artificial limitation. The type checker does know about layout constraints, and we model `AnyObject` as a layout constraint without a real protocol backing it. The way that works is that `AnyObject` is a typealias for a builtin that exposes the existential type with the layout constraint:

public typealias AnyObject = Builtin.AnyObject

Although the existential of `HomogeneousAggregate` is itself of questionable usefulness, it falls out from the other language rules that using an existential in an inheritance clause or generic constraint imposes the generic constraints on the type, in other words, `<T: AnyObject>` means `T` has the AnyObject constraint.

Following the example of AnyObject, we could also expose `HomogeneousAggregate` as a builtin which imposes the layout constraint, and then use extensions to expose the API we want in terms of builtin functionality:

public typealias HomogeneousAggregate = Builtin.HomogeneousAggregate
extension HomogeneousAggregate {
  typealias Element = Builtin.HomogeneousAggregateElement<Self>
  var count: Int { return MemoryLayout<Self>.stride / MemoryLayout<Element>.stride }
}

I think the rules for layout constraints feel like a better fit for this than protocol conformance. A type can't retroactively conform to `AnyObject` and doesn't have to explicitly declare conformance to it, it implicitly conforms by being a class. Similarly, it feels natural to me that homogeneous aggregate types should just work as `HomogeneousAggregate`s in code that can see that they are in fact homogeneous (though in this case, we may also want resilient structs to be able to explicitly declare the constraint, if that's something they want to make into an API promise).

@jckarter
Copy link
Member

A good dry run for this approach might be to try to expose `Trivial` as a user-facing layout constraint too, since it already exists. One thing we'd need to do with either Trivial or HomogeneousAggregate is add type checker support for telling whether a type satisfies the constraint. That might be tricky since we normally use SIL-level information to do those checks, but maybe requestification makes that less of an issue?

@atrick
Copy link
Member Author

atrick commented Jul 17, 2019

We actually need to expose the Trivial layout constraint in order to properly enforce several of our UnsafeRawPointer APIs, which are only valid for trivial types.

@atrick
Copy link
Member Author

atrick commented Jul 18, 2019

Note: If this is going to be a stand-alone function (because layout constraints don't have witnesses) then we may want to call it withUnsafeElements. That way we can introduce a separate standalone function the works on scalars: withUnsafeBufferPointer<T, R>(to: T, (UnsafeBufferPointer<T>) throws -> R) rethrows -> R. This would always produce a buffer pointer with count = 1. This may seem trivial but it's actually important to encourage buffer-pointers instead of withUnsafePointer so capacity doesn't need to be manually computed when subsequently calling withMemoryRebound on that pointer. I see a lot of bad code like this:

var value: Int = 0
withUnsafePointer(to: value) { ptr in
  ptr.withMemoryRebound(to: UInt8.self, capacity: 1) {
    readFourBytes($0)
  }
}

That should be:

var value: Int = 0
withUnsafeBufferPointer(to: value) { ptr in
  ptr.withMemoryRebound(to: UInt8.self) {
    readFourBytes($0)
  }
}

@stephentyrone
Copy link
Member

Why do you want to have two essentially equivalent functions with different names?

@stephentyrone
Copy link
Member

(Answering my own question): to disambiguate between getting a buffer of 1 x self and getting a buffer of n x elements. Still, not totally satisfactory.

@atrick
Copy link
Member Author

atrick commented Jul 18, 2019

@stephentyrone I originally wanted to propose the same function name for both features. Then I noticed how bad the type checker is at inferring closure arguments and providing coherent diagnostics. So I though it best to avoid the overload and ambiguity. I don't feel strongly either way.

@jckarter
Copy link
Member

I agree that overloading is best avoided, both for human understanding and the type checker's sake. Having the operations be top-level functions might also make exposing these type layout constraints easier to implement; thinking about it more, doing name lookup based on type layout constraints might be problematic for type checking, since unlike explicit protocol conformances, there's no explicit indication in AST we should look for members in extensions on type constraints, and there's no context to know we should be checking for type layout constraints given something like randomStruct.withUnsafeBufferPointer { ... }.

@jckarter
Copy link
Member

Similarly, spelling the constraint with a generic argument like HomogeneousAggregate<Element> would avoid us having to hack name lookup for an associated type. (Maybe a construct of nested aggregates could conceivably legitimately be a HomogeneousAggregate of multiple types too? For instance, ((Int, Int), (Int, Int)) could be seen as satisfying both HomogeneousAggregate<(Int, Int)> and HomogeneousAggregate<Int>.)

@swift-ci swift-ci transferred this issue from apple/swift-issues Apr 25, 2022
@AnthonyLatsis AnthonyLatsis added swift evolution proposal needed Flag → feature: A feature that warrants a Swift evolution proposal and removed new feature labels Nov 11, 2022
@atrick
Copy link
Member Author

atrick commented May 18, 2023

@tbkka @stephentyrone @jckarter

I suggest supporting a withUnsafeElements-style API only where
MemoryLayout<Element>.size == MemoryLayout<Element>.stride.

This naturally handles all imported C types, and only allows Swift types to participate when it is safe to view the elements as a contiguous region of memory with tail padding (there won't actually be any tail padding, so the illusion is maintained).

For example, this contrived use case is still safe:

var bytes: (CChar, CChar, CChar, CChar) = (0x61, 0x62, 0x63, 0)
let name: String = withUnsafeElements(of: bytes) {
  (buffer: UnsafeBufferPointer<CChar?>) in
  return String(validatingUTF8: buffer.baseAddress!)
}

Viewing types with a size vs. stride mismatch as a pointer is extremely dangerous, particularly when passing the pointer to a C API that expects C layout. It is beneficial to require memory binding when dabbling in that realm.

Note that we may still want to allow subscripting and collection conformance on all homogenuos tuples, regardless of size and stride. This is fine because a subscript setter will only overwrite size bytes.

@atrick atrick changed the title [SR-11147] Add withUnsafeBufferPointer<T: HomogeneousAggregate, Result> [SR-11147] Add withUnsafeElements<T: HomogeneousAggregate, Result>(of:) May 18, 2023
@AnthonyLatsis AnthonyLatsis added the compiler The Swift compiler in itself label May 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler The Swift compiler in itself feature A feature request or implementation standard library Area: Standard library umbrella swift evolution proposal needed Flag → feature: A feature that warrants a Swift evolution proposal
Projects
None yet
Development

No branches or pull requests

4 participants