[SR-7124] Double-rounding in floating-point literal conversion #49672

stephentyrone · 2018-03-05T21:00:39Z


Previous ID	SR-7124
Radar	None
Original Reporter	@stephentyrone
Type	Bug

Additional Detail from JIRA


Votes	2
Component/s	Compiler, Standard Library
Labels	Bug
Assignee	None
Priority	Medium

md5: 54e8ad8dff55322c12e5604790553880

relates to:

SR-920 Re-design builtin compiler protocols for literal convertible types
SR-3970 Rework ExpressibleBy(Integer|Float)Literal protocols

Issue Description:

Floating-point literals are currently converted first to `_MaxBuiltinFloatType`, and then to the actual type. This rounds twice, which necessarily brings up the possibility of double rounding errors. Here's a concrete example of this in action on x86, where we round first to `Float80`:

1> let x = 1.000000000000000111076512571139929264063539449125528335571

x: Double = 1

`1` is incorrect; the "correct" (single-rounding) double result is `1.0000000000000002` (the literal is just barely larger than the halfway point between the two values, but the initial rounding to `Float80` rounds exactly to the halfway point, so the subsequent rounding rounds down to `1`).

How do we fix this? The usual approach is to make the first rounding happen in a special rounding mode, "round to odd", which rounds any non-exact value to the representable number whose least significant bit is set. This solves the problem for any type that's more than one bit smaller than the intermediate type, but would result in us getting the wrong result if they're the same (then we should be rounding to nearest instead).

We could solve this for concrete built-in types (by far the more important thing to fix) by rounding first to an intermediate type with, say, 32b exponent and 128b significand under the round-to-odd rule (this would also let us implement `Float128` in the future without further changes). This would not solve the problem in general for arbitrary user-defined binary floating-point types, since we don't know how large they may be; we should really be providing them with something like a hexadecimal digit sequence in the longer term.

The text was updated successfully, but these errors were encountered:

stephentyrone · 2018-03-05T21:22:42Z

Note that for efficiency in generic contexts, we might want an arbitrary-width floating-point literal type to stash pre-rounded `Float` and `Double` values.

xwu · 2018-03-05T21:33:21Z

I was browsing through Python's documentation for its decimal type. It's got this neat DecimalTuple type (sign, exponent, significand) that representations can use. I wonder if it'd be feasible to ape something like that in a revised protocol.

protocol FloatLiteralProtocol {
  associatedtype RawExponent: UnsignedInteger, _ExpressibleByBuiltinIntegerLiteral
  associatedtype RawSignificand: UnsignedInteger, _ExpressibleByBuiltinIntegerLiteral
  static var radix: Int { get }
  var sign: FloatingPointSign { get set }
  var bias: RawExponent { get set }
  /* Can't use constants in generics,
     but the compiler can set the right bias
     based on RawSignificand.bitWidth. */
  var exponentBitPattern: RawExponent { get set }
  var significandBitPattern: RawSignificand { get set }
}

struct BinaryFloatLiteral<T, U>: FloatLiteralProtocol {
  static let radix = 2
  var sign: FloatingPointSign
  var bias: T
  var exponentBitPattern: T
  var significandBitPattern: U
}

struct DecimalFloatLiteral<T, U>: FloatLiteralProtocol {
  static let radix = 10
  var sign: FloatingPointSign
  var bias: T
  var exponentBitPattern: T
  var significandBitPattern: U
}

protocol ExpressibleByFloatLiteral {
  associatedtype FloatLiteral: FloatLiteralProtocol
  init(floatLiteral: FloatLiteral)
}

extension Float: ExpressibleByFloatLiteral {
  init(floatLiteral: BinaryFloatLiteral<UInt, UInt32>) {
    _sanityCheck(floatLiteral.bias == 127)
    init(
      sign: floatLiteral.sign,
      exponentBitPattern: floatLiteral.exponentBitPattern,
      significandBitPattern: floatLiteral.significandBitPattern)
  }
}

stephentyrone · 2018-03-05T21:48:05Z

I don't see a reason to use a biased exponent for the literal protocols, but broadly speaking something like that could work.

tbkka · 2018-04-13T17:57:34Z

For floating-point printing, I want to expose a `decimalFloatRepresentation` property that returns a `DecimalFloatRepresentation` struct containing the sign, integer exponent, and base-10 sequence of digits. This struct can then be used for a variety of special-purpose formatting needs. For performance, I had planned to store up to 21 digits inline (avoiding the additional heap allocation required for a general `Array` type.

The same struct could be used as an intermediate form for decimal parsing as well, though that would require the struct to support arbitrary numbers of digits.

Of course, for efficiency the compiler would have to know about the standard types. `1.2e34 as Float` should always compile to the correct constant.

stephentyrone · 2018-04-13T18:11:37Z

@tbkka One solution I've thought about is that the literal representation could actually be a struct containing the value correctly rounded to Float, Double, Float80, and the full decimal representation for types that need to do their own parsing.

tbkka · 2018-04-13T20:19:14Z

Another example popped up recently while testing Float printing. The following line of code currently produces different values on different architectures because of this bug:

   let a = 7.038531e-26 as Float

On x86_64 and i386, this produces the correct result because when compiling for those platforms, `_MaxBuiltinFloatType` is Float80 and this particular value happens to round correctly when parsed as Float80 and then truncated to Float.

But on ARM, this assigns a different value to `a` because `_MaxBuiltinFloatType` is Double for ARM targets and this value does not round correctly when parsed as Double and then truncated to Float.

tbkka · 2018-04-13T20:59:38Z

Pruning this problem back a bit: What we want is for the compiler to parse the string `1.234` appearing in source differently depending on whether it's followed by `as Float` or `as Double` or `as Float80` or `as MySpecialType`.

I think we already have the language machinery to do this: Types that conform to `ExpressibleByFloatLiteral` already have to declare their `FloatLiteralType`, so the compiler "just" needs to be taught to handle float tokens as follows:

Identify the token as a float but don't parse it
Identify the target type
Parse the token according to the associated `FloatLiteralType`
Invoke the initializer with the parsed result

In particular, `Float.FloatLiteralType = Float` would then suffice for all Float literals to be correctly parsed.

To allow greater freedom, we could later extend this to allow `FloatLiteralType = String` for types that really just want to parse the raw token at run time as it appeared in the source.

let a = 1.2345e+123456789 as MyBigExponentFloat

extension MyBigExponentFloat: ExpressibleByFloatLiteral {
   associatedtype FloatLiteralType = String
   init(floatLiteral: String) {
      // floatLiteral == "1.2345e+123456789"
   }
}

This also gracefully handles decimal, hexadecimal, and possibly other float literal representations. Basically, the compiler can parse standard forms for you, other cases can be handled by the custom type however it wishes. Broken-down representations such as the proposed `DecimalFloatRepresentation` can then be simple run-time helpers for these custom initializers; the compiler does not need to treat them specially.

stephentyrone · 2018-04-13T21:05:11Z

That's not sufficient, for two reasons:

a. If you look at the plumbing backing FloatLiteralType, you'll find that the compiler first converts to the internal `_MaxBuiltinFloatType`, then to whatever the `FloatLiteralType` is.

b. You need a type for the literal in a context generic over `BinaryFloatingPoint`.

stephentyrone · 2018-04-13T21:07:58Z

If we want to make a quick fix, the best solution is probably something like the following:

public protocol _ExpressibleByBuiltinFloatLiteral {
  init(_builtinFloatLiteral value: _MaxBuiltinFloatType, _roundingInformation: Int)
}

where the extra roundingInformation records if `value` is exact, rounded up, or rounded down.

tbkka · 2018-04-13T21:13:32Z

... the compiler first converts to the internal `_MaxBuiltinFloatType`, then to whatever the `FloatLiteralType` is....

I think that behavior within the compiler is the root cause of this bug. The compiler should always convert the source token text directly to the final `FloatLiteralType`. Ultimately, that means the compiler needs to hold onto the token text until it actually knows the `FloatLiteralType`.

Put slightly more bluntly: The existence of `_MaxBuiltinFloatType` is a bug.

stephentyrone · 2018-04-13T21:14:37Z

> Put slightly more bluntly: The existence of `_MaxBuiltinFloatType` is a bug.

Yeah, it's this bug.

tbkka · 2018-04-13T21:54:08Z

You need a type for the literal in a context generic over `BinaryFloatingPoint`.

I think this might be another bug in the current protocols. Since the `FloatLiteralType` is something that gets acted on by the compiler, the real constraint is that the type must be something into which the compiler can convert a floating-point source token. There's no a priori need for it to conform to any particular numeric type hierarchy.

stephentyrone · 2018-04-13T21:56:20Z

Totally agree (which is, again, why the general literal type should be [Hexa]decimalCharacterSequence, possibly augmented with some additional information for efficiency).

swift-ci transferred this issue from apple/swift-issues Apr 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SR-7124] Double-rounding in floating-point literal conversion #49672

[SR-7124] Double-rounding in floating-point literal conversion #49672

stephentyrone commented Mar 5, 2018

stephentyrone commented Mar 5, 2018

xwu commented Mar 5, 2018

stephentyrone commented Mar 5, 2018

tbkka commented Apr 13, 2018

stephentyrone commented Apr 13, 2018

tbkka commented Apr 13, 2018

tbkka commented Apr 13, 2018

stephentyrone commented Apr 13, 2018

stephentyrone commented Apr 13, 2018

tbkka commented Apr 13, 2018

stephentyrone commented Apr 13, 2018

tbkka commented Apr 13, 2018

stephentyrone commented Apr 13, 2018

Navigation Menu

[SR-7124] Double-rounding in floating-point literal conversion #49672

[SR-7124] Double-rounding in floating-point literal conversion #49672

Comments

stephentyrone commented Mar 5, 2018

stephentyrone commented Mar 5, 2018

xwu commented Mar 5, 2018

stephentyrone commented Mar 5, 2018

tbkka commented Apr 13, 2018

stephentyrone commented Apr 13, 2018

tbkka commented Apr 13, 2018

tbkka commented Apr 13, 2018

stephentyrone commented Apr 13, 2018

stephentyrone commented Apr 13, 2018

tbkka commented Apr 13, 2018

stephentyrone commented Apr 13, 2018

tbkka commented Apr 13, 2018

stephentyrone commented Apr 13, 2018