Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SR-381] API to lookup a Type given a top-level class name #42998

Closed
lhoward opened this issue Dec 25, 2015 · 97 comments
Closed

[SR-381] API to lookup a Type given a top-level class name #42998

lhoward opened this issue Dec 25, 2015 · 97 comments
Assignees
Labels
affects ABI Flag: Affects ABI feature A feature request or implementation mangling Area → compiler: Mangling runtime The Swift Runtime standard library Area: Standard library umbrella

Comments

@lhoward
Copy link
Contributor

lhoward commented Dec 25, 2015

Previous ID SR-381
Radar None
Original Reporter @lhoward
Type New Feature
Status Closed
Resolution Done
Additional Detail from JIRA
Votes 1
Component/s Standard Library
Labels New Feature, AffectsABI, Runtime
Assignee @jckarter
Priority Medium

md5: 13f1c2ae7174aab8983f1dde3a976c53

blocks:

  • SR-377 Implement NSKeyed[Un]Archiver

relates to:

  • SR-412 Linux: Cannot define C function pointer that returns metaclass

Issue Description:

This may be useful to implement NSStringFromClass()-compatible behaviour which is necessary in order to implement NSKeyedArchiver. I am using the attached workaround at the moment but obviously it is very fragile.

@lhoward
Copy link
Contributor Author

lhoward commented Dec 29, 2015

@jckarter
Copy link
Member

Is there a reason it has to be a mangled name? The qualified unmangled name produced by String(T.self) should be unique for any T. I envision us having a corresponding runtime API to parse and look up a type by (unmangled) name as well.

@lhoward
Copy link
Contributor Author

lhoward commented Dec 30, 2015

We need to match NSClassFromString()/NSStringFromClass() behaviour or, more accurately, their analogues in libobjc. For Swift class names deeper than two levels (e.g. a nested class), this returns the mangled type name. See:

lhoward/swift-corelibs-foundation@3949982

(I did originally propose on swift-corelibs-dev what you suggested.)

@jckarter
Copy link
Member

I see, is it expected that archives will be interoperable across the objc and corelibs implementations?

@lhoward
Copy link
Contributor Author

lhoward commented Dec 30, 2015

I don't speak for Apple, but my understanding is yes.

cc @phausler

Alternatively we limit interoperability to Foundation and single-level non-generic Swift classes. I realise that the current implementation is broken for generic classes.

Really it would be nice to have:

  • a variant of swift_getTypeName() that either conforms to the class_getName() behaviour on Darwin, or unconditionally returns the mangled type name

  • a swift_get[Class]MetadataByName() or similar (using dlsym() or whatever heuristics suit)

@phausler
Copy link
Member

Tony and I discussed early on - and cross platform archival is really important in our opinion since it is one of the defining features of Foundation.

@phausler
Copy link
Member

We might be able to get away with not letting nested classes be codable due to class encoding differentials (since id guess that those are the distinct minority) but I would like to have a story for them (even if we have to modify the darwin version to be in parity) since I know that the mangling patterns are subject to change and we should try not to rely on them.

@jckarter
Copy link
Member

Yeah, the mangling might be too brittle for cross-platform and cross-version mangling, especially since we've been discussing potentially using platform-dependent manglings. It might be worth changing the ObjC Foundation behavior to reliably use demangled type names for all types. There's already a swift_getTypeName API libobjc could potentially plug in to.

@phausler
Copy link
Member

actually cross platform works so far for the places we do use it surprisingly enough (it is just our module name that is the rub) the version changes however are VERY brittle; we have had one break so far since the inception of this project already.

I will follow up on the Darwin version to see what we can do for that.

@lhoward
Copy link
Contributor Author

lhoward commented Dec 30, 2015

FWIW @belkadan said this in a post to swift-corelibs-dev:

No, we cannot encode things "non-mangled but with the namespace". For any type other than top-level non-generic class types, using a non-mangled name is not unique. The only correct answer for arbitrary classes is to use mangled names, or something that maps one-to-one with mangled names.

This appears to contradict @jckarter's statement above that "The qualified unmangled name produced by String(T.self) should be unique for any T".

@jckarter
Copy link
Member

Sorry, I meant String(reflecting: T.self), which produces the fully qualified name. That is and should remain unique per-type. There are tradeoffs: The mangled name is more compact, but less human-understandable. It's also more implementation-dependent (though that's not a problem if we successfully finalize the mangling). Being able to generate demangled names from arbitrary types is also a code size increase for the runtime; I think we need to be able to parse and print human-readable demangled type names no matter what, but we don't necessarily need to be able to generate mangled type names.

@phausler
Copy link
Member

I think the goal here is to have it conform to an identity transform since NSClassFromString and NSStringFromClass can round trip (and consequently then of course things that are encoded with NSKeyedArchiver can be decoded via NSKeyedUnarchiver).

I have a hacky (but less hacky than reaching into the stdlib for it's private functions) way of dealing with the naming of Foundation (SwiftFoundation.NSString versus NSString etc) classes that was proposed in the discussion with Jordan. It will end up making the surface functionality work similarly enough for superficial needs but in the end it will be a behavioral difference that will eventually need to be contended with.

@jckarter
Copy link
Member

It should be possible for the Swift runtime to support a round-tripping getTypeByName such that swift_getTypeByName(swift_getTypeName(x)) gives you back the same type, yeah.

@lhoward
Copy link
Contributor Author

lhoward commented Dec 30, 2015

@phausler commit 09f143983 (I think I pushed it) removes support for looking up mangled names in NSCoding, that will still allow Foundation to work without explicitly registering the classname mappings

@lhoward
Copy link
Contributor Author

lhoward commented Dec 31, 2015

Noted regarding the overhead of putting the mangling code into the runtime – however if we're going to leverage the dynamic linker for looking up classes then swift_getTypeByName() will need this. The alternative of generating a name to type dictionary at compile time is also bloat (of undefined size), although it could have the advantage of being able to lookup classes for which there are no public symbols (but I don't consider this a feature).

@jckarter
Copy link
Member

We shouldn't use dlsym. That's really slow, requires the symbols to be public or at least unstripped, and doesn't work at all for JITed code. We did that as a hack to get bare-bones protocol conformance lookup working in the first Swift betas and it was awful. A better approach would be a registration table for reflectable types similar to the protocol conformance table, or perhaps just using the protocol conformance table as is, since most types will conform to at least one protocol, and we can add dummy entries for ones that don't.

@lhoward
Copy link
Contributor Author

lhoward commented Dec 31, 2015

This would work for NSCoding as NSCoding/NSSecureCoding are protocols. Can you give me any hints on where to start?

@jckarter
Copy link
Member

Sure, for a rough cut take a look at swift_conformsToProtocol's implementation. First, it collects the conformance tables from all the currently-loaded images and installs callbacks to pick them up from any future dynamically dlopen-ed ones. To do the lookup itself, it first checks its cache, then falls back to scanning any conformance tables that haven't yet been scanned. A similar approach should get you started with a type name lookup implementation. There are some obvious optimizations that could be added later (for instance, if each image knew its module name, we could key the conformance tables by section to reduce the amount we need to scan).

@lhoward
Copy link
Contributor Author

lhoward commented Dec 31, 2015

Thanks @jckarter! I will take a look.

@lhoward
Copy link
Contributor Author

lhoward commented Dec 31, 2015

OK, I have something simple working that implements:

@warn_unused_result
public // @testable
func _typeByName(name: String) -> Any.Type?

No caching yet but that should be trivial to add.

Of course it only works if the class conforms to a protocol, so it's not a general purpose solution, but it would be good enough for NSKeyedUnarchiver, or perhaps all of Foundation (given the presence of NSObjectProtocol).

Edit: saw your comment above about adding dummy conformance entries for types that do not conform to protocols.

@jckarter
Copy link
Member

An initial non-caching, incomplete implementation sounds like a good starting point to me. We can build from there. Thanks for taking this on!

@lhoward
Copy link
Contributor Author

lhoward commented Dec 31, 2015

A pleasure and thanks for being so responsive (particularly over the holiday break!). It'll be great for object serialization to have the dynamism of the Foundation on Darwin.

@lhoward
Copy link
Contributor Author

lhoward commented Dec 31, 2015

This patch is working OK with NSKeyedUnarchiver, the only issue is that subclasses will not have a protocol conformance entry. So something like:

aClass = _typeByName("Foundation.NSMutableArray", conformingTo: NSCoding.self)

will fail. So it's possible the only approach that will work is to have dummy conformance table entries for every class, or at least duplicate ones for subclasses, at which point maybe we are better off just emitting a table of pointers to the class metadata in each object. That might be beyond me though...

@lhoward
Copy link
Contributor Author

lhoward commented Dec 31, 2015

Here's where I am so far:

master...lhoward:SR-381

@jckarter
Copy link
Member

Cool, feel free to start a pull request. I'll be on real vacation starting tomorrow though so might not be able to comment till next week. IMO, the runtime call should just be getTypeByName without also checking the protocol; you'll have to do an as? cast from Any.Type to the existential metatype anyway, which can do the protocol check.

@phausler
Copy link
Member

It is not un-reasonable for us to mark NSMutableArray to adopt NSCoding if that would work. Not sure if that would help any or not.

@lhoward
Copy link
Contributor Author

lhoward commented Dec 31, 2015

Unfortunately explicitly conforming subclasses to a protocol to which their superclass already conforms triggers the following error:

$ ./swiftc -o lookup lookup.swift
lookup.swift:8:29: error: redundant conformance of 'SubClass' to protocol 'SomeProtocol'
class SubClass : SomeClass, SomeProtocol {
                            ^
lookup.swift:8:7: note: 'SubClass' inherits conformance to protocol 'SomeProtocol' from superclass here
class SubClass : SomeClass, SomeProtocol {
      ^

If we can silence the error it would work for Foundation – but it'll be confusing for end users because the failure (not being able to unarchive) will happen at runtime. But it would be a start I guess.

@lhoward
Copy link
Contributor Author

lhoward commented Dec 31, 2015

I agree we ultimately want a generic getTypeByName API that does not take a protocol. However having an API whose name suggests it will work for all cases, but which only works for a subset, might be confusing.

Edited: I think I see what you mean.

@jckarter
Copy link
Member

I think that's fine, since only working on types with conformances is hopefully a temporary situation. That error is intended language behavior, and I don't think we can reasonably disable it. Can Foundation encode NSMutableStrings as NSStrings? Are there other important subclasses?

@lhoward
Copy link
Contributor Author

lhoward commented Dec 31, 2015

We might be able to get away with disabling subclass encoding for now, I'll look at what the other classes are. If they're just mutable subclasses, it's perhaps more an uncommon scenario than their immutable superiors, and fortunately NSNumbers (which are a subclass of NSValue <NSCoding>) are encoded directly.

(We can't change the encoding format, e.g. to add a mutable flag, because the format needs to be identical with Foundation on Darwin.)

@lhoward
Copy link
Contributor Author

lhoward commented Jan 6, 2016

Doing the caching in the stdlib implementation is fine, I just haven't done it yet.

@lhoward
Copy link
Contributor Author

lhoward commented Jan 6, 2016

In NSStringFromClass(), what's the safest way to check if a class is a generic – is it safe to look for "<" in the unmangled string representation or can that character be escaped?

@lhoward
Copy link
Contributor Author

lhoward commented Jan 6, 2016

Not sure I understand the comment about registration. Are you suggesting we only need a function that provides the mangled name given a type? Or are you suggesting that Foundation only needs to support encoding, not decoding?

Providing classes for a decoding session only applies to secure coding and also does not require that the caller explicitly register a class name to class mapping.

Only providing encoding support runs counter to the goal of Foundation having feature parity on both platforms.

@belkadan
Copy link
Contributor

belkadan commented Jan 6, 2016

I really think you're doing this backwards. Assume mangled names, pattern-match the one safe case, and turn that back into the fully-qualified name. That's what libobjc does (search for copySwiftV1DemangledName).

I don't remember how to get a mangled name for a type but I think that's in the stdlib somewhere. Or at least a bare runtime entry point.

@jckarter
Copy link
Member

jckarter commented Jan 6, 2016

The runtime should handle the core type lookup logic in any case. My concerns with using mangled names is that mangled names are bad API for reflective purposes, since they're unreadable and nontrivial to compose. We're already careful about changing the format produced by String(reflecting:) because of compatibility concerns, and the format should already include a token to represent local scopes (and can be fixed to do so if it doesn't). You can compose by string interpolation fairly easily, which is important if you want "Module.Foo<(typeB)>" or something like it. Substitutions (and soon, compression) make this composition expensive for mangled names. Maybe we can design some sort of abstract type grammar that the typeName/typeByName entry points trade in, but that has its own complications.

@lhoward
Copy link
Contributor Author

lhoward commented Jan 6, 2016

In reply to @belkadan, there is _swift_buildDemanglingForMetadata() in stdlib which the PR for this bug report exposes as _canonicalTypeName().

OK, what you say makes sense. I'll remove _typeByName() and put the transformation for the one safe case into Foundation.

@lhoward
Copy link
Contributor Author

lhoward commented Jan 6, 2016

@jckarter two issues with unmangled names:

  • Can we guarantee the name produced by String(reflecting:) is always unique? See @belkadan's comment about nested classes with different containing types

  • For Foundation, are we OK with breaking compatibility with existing archives that encode the mangled name?

@lhoward
Copy link
Contributor Author

lhoward commented Jan 6, 2016

How about this then:

private func buildCanonicalNameForClass(aClassName: String) -> String? {
    var name : String
    
    if aClassName.hasPrefix("_Tt") {
        return aClassName
    }
    
    var components = aClassName.bridge().componentsSeparatedByString(".")
    if components.count == 1 {
        components = [ _SwiftFoundationModuleName, aClassName ]
    } else if components.count != 2 {
        return nil
    }
    
    if components[0].isEmpty || components[1].isEmpty {
        return nil
    }
    
    name = "_TtC"
    
    if components[0] == "Swift" {
        name += "Ss"
    } else {
        name += String(components[0].length) + components[0]
    }
    
    name += String(components[1].length) + components[1]

    return name
}

internal func NSClassFromString(aClassName: String) -> AnyClass? {
    guard let canonicalName = buildCanonicalNameForClass(aClassName) else {
        return nil
    }
    
    return _typeByCanonicalName(canonicalName) as? AnyClass
}

@belkadan
Copy link
Contributor

belkadan commented Jan 6, 2016

> Providing classes for a decoding session only applies to secure coding

If we only supported secure coding for Linux (at least for now), this would be good enough to build a list of all possible class names you could encounter in the archive. I'm not saying this is an ideal solution, but it gets Foundation unblocked while we (Foundation and Swift runtime) design and figure out _typeForName—which, AFAIK, we do not have an implementation plan for yet. (Using dlsym is not something we want to ship with.)


@jckarter and I talked this morning for quite a while about mangled vs. demangled names, and didn't quite manage to convince each other either way. Things that came up:

  • Mangled names are not great for generic types, because instantiating a generic type requires decomposing the base type and the generic parameters. For a mangled name that means fully demangling the thing, whereas a simpler representation could just split on <>. (Joe already mentioned this above.)
  • Demangled names are pretty stable for plain old nominal types, even generic ones, but not when they are nested in extensions of types from other modules, and not so much for types imported from C. I'd really like to make the demangling there nicer and more explicit, but those sorts of modifications would become forbidden, and we'd be stuck with a bad demangling. (This is one of my primary concerns: the demangling becomes a persistent and possibly parseable format where it's previously been just human text.)
  • We do have other uses of demangled names (such as the interface file case above), but those don't scale well to anything more complicated than "Foo.Bar.Baz". I wouldn't want to manually name a file "MyApp.MyViewController<Swift.Int, () -> (MyApp.MyData)>.xib", especially when using unqualified names or typealiases anywhere would silently fail. This doesn't apply to archiving, though, because class names for archiving aren't written by hand.
  • Both mangled and demangled names are not stable for private types and for local types. We should discourage archiving these types without using a custom name at the NSKeyed(Archiver|Unarchiver) level.
  • We may want to change symbol mangling when we have an ABI break (e.g. moving to a new platform). This doesn't have to change runtime mangled names, though—they're not required to be the same as symbol names.

@lhoward
Copy link
Contributor Author

lhoward commented Jan 6, 2016

Just to address the first comment:

  • I don't really see point of one-way archiving. "All possible class names" is unbounded for third-party applications. A static class mapping for Foundation only is the less worse alternative here (which I am happy to leave to Apple to implement).

  • The current implementation does not use dlsym(), it uses protocol conformance tables which is sufficient for Foundation (as long as the class directly conforms to a protocol, which unless it is a subclass of a codable class it will by virtue of NSCoding). This approach was suggested by @jckarter. (For classes with inherited conformance, explicitly conforming to a dummy protocol is a workaround. The implementation for generic classes does depend on dladdr(), granted.)

See https://github.com/apple/swift/pull/834/files

@lhoward
Copy link
Contributor Author

lhoward commented Jan 6, 2016

Thanks for the analysis above, it all makes sense.

The only thing I would add is: libobjc/Foundation on Darwin uses a particular encoding of class names today (unmangled for top-level classes, mangled otherwise). Corelibs Foundation has an explicit goal of compatibility with Foundation on Darwin, as I understand it. So whilst I understand the desire for a long-term solution perhaps we need something that matches libobjc's behaviour in the interim, with whatever division of responsibility between stdlib and Corelibs Foundation deemed appropriate.

@belkadan
Copy link
Contributor

belkadan commented Jan 6, 2016

Yep, sorry, hadn't kept up. Thanks for the new implementation!

> "All possible class names" is unbounded.

…aargh, I forgot that the root object does not itself have a set of expected classes.

@lhoward
Copy link
Contributor Author

lhoward commented Jan 6, 2016

I added a cache but it uses StringMap which pulls in libLLVMSupport (and libswiftCore.dylib seems to be linked with -all_load). Not ideal: we should probably use a different implementation or somehow pull in StringMap.cpp directly.

@jckarter
Copy link
Member

jckarter commented Jan 6, 2016

We shouldn't link the runtime against any LLVM libraries. Can you use DenseMap instead, or the ConcurrentMap that's implemented in the runtime already?

@lhoward
Copy link
Contributor Author

lhoward commented Jan 6, 2016

Fixed to use ConcurrentMap.

@lhoward
Copy link
Contributor Author

lhoward commented Jan 8, 2016

Until such time we have a stable and unique de-mangled name format:

public // SPI(Foundation)
func _topLevelClassByName(name: String) -> AnyClass?

Edit: we could just call this typeByName and return Any but I think that is misleading, even though that's the eventual API we want.

I have updated the nscoding branch of Foundation to use this. I am still cleaning up the pull request for stdlib with a view to requesting it to be integrated after the 2.2 branch.

@lhoward
Copy link
Contributor Author

lhoward commented Jan 8, 2016

NSKeyedArchiver/NSKeyedUnarchiver Linux/OS X tests pass with this change.

@lhoward
Copy link
Contributor Author

lhoward commented Jan 8, 2016

One more thing: in the interests of minimizing developer surprise, we should find a way for this API to work where the class has no direct protocol conformance. If emitting dummy conformance records for every type is problematic, we could special case NSCoding. (In the Swift 3 timeframe perhaps a different approach to runtime metadata will exist, but for the interim.)

Foundation is using an explicit dummy protocol for codable classes that inherit NSCoding, but expecting developers to know the limitations of NSClassFromString might be unreasonable.

@jckarter
Copy link
Member

jckarter commented Jan 8, 2016

We can add that later. Developer surprise isn't a problem for a work-in-progress.

@lhoward
Copy link
Contributor Author

lhoward commented Jan 9, 2016

Here is a patch to include type metadata in a new section, in the case where it is not already included in the protocol conformance table.

https://github.com/apple/swift/pull/834/files

I also (per an earlier comment from @jckarter) reverted the API consumed by Foundation to _typeByName() -> Any.Type?. On reflection (no pun intended), maybe it's better to commit to a final signature even though the functionality will be explicitly limited for now.

@lhoward
Copy link
Contributor Author

lhoward commented Jan 16, 2016

Thank you to @jckarter for merging this. Confirmed Foundation builds and tests out with the current Swift compiler/runtime from master. @phausler, let me know what I can do to help you get the lhoward/nscoding branch merged.

@lhoward
Copy link
Contributor Author

lhoward commented Jan 16, 2016

Merged to master.

@lhoward
Copy link
Contributor Author

lhoward commented Jan 16, 2016

Foundation NSKeyedArchiver/NSKeyedUnarchiver tests pass on Linux with swift master (thanks to @gribozavr for fixing build regression I accidentally introduced!).

@swift-ci
Copy link
Collaborator

Comment by admin (JIRA)

(updating fields only)

@swift-ci swift-ci transferred this issue from apple/swift-issues Apr 25, 2022
@AnthonyLatsis AnthonyLatsis added API request mangling Area → compiler: Mangling and removed new feature labels Nov 11, 2022
@AnthonyLatsis AnthonyLatsis added feature A feature request or implementation and removed API request labels Jan 27, 2023
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects ABI Flag: Affects ABI feature A feature request or implementation mangling Area → compiler: Mangling runtime The Swift Runtime standard library Area: Standard library umbrella
Projects
None yet
Development

No branches or pull requests

6 participants