JSONSerialization: Improve parsing of numbers #1657

spevans · 2018-08-08T12:07:15Z

Check the number looks like a JSON number and exit early if not.
Use the native Int64(), UInt64(), Double() parsers to avoid creating
a C string and passing to strtol()/strtod(). This also eliminates a
memcpy() and removes the 63 digit restriction which would fail to
parse numbers expressible by Double's full exponent.
For numbers with a leading '-' sign, parse using Int64() falling
back to Double(), otherwise parse using UInt64() failling back to
Double().

- Check the number looks like a JSON number and exit early if not. - Use the native Int64(), UInt64(), Double() parsers to avoid creating a C string and passing to strtol()/strtod(). This also eliminates a memcpy() and removes the 63 digit restriction which would fail to parse numbers expressible by Double's full exponent. - For numbers with a leading '-' sign, parse using Int64() falling back to Double(), otherwise parse using UInt64() failling back to Double().

spevans · 2018-08-08T12:07:36Z

@swift-ci please test

itaiferber

Looks like a reasonable change to me. Do we want to apply our efforts in also trying to parse Decimals here?

itaiferber · 2018-08-10T16:38:27Z

Foundation/JSONSerialization.swift

+        let MINUS = UInt8(ascii: "-")
+
+        var isNegative = false
+        var string = ""


Should we try to preserve the UTF-8 behavior here of walking a pointer along, rather than appending a single character at a time here? (Given that the vast majority of JSON provided is given to us in UTF-8, it'd be nice to maintain the performance there.)

I did think about this but strings on x86_64/ARM64 upto 15 ASCII characters should actually fit into a SSO so avoid a memory allocation which probably covers most numbers to be parsed.

The other issue is that strings passed to Int64(), UInt64() and Double() cant have any invalid trailing characters so this avoids creating a string of all the available characters using String(bytesNoCopy:) which I believe still get validated according to the encoding which could end up reading through the whole of the rest of the JSON document and then scanning through it to determine the new shorter count.

As an performance enhancement, when validating the characters its possible to count the number of integers and look for .eE and directly jump to parsing as a Double()

I was thinking more along the lines of String.init(bytesNoCopy:length:encoding:freeWhenDone:) which would allow us to avoid reading to the end of the document and wouldn't necessitate doing any further validation. Might be worth doing some small perf tests, just to see. (Or is this not available in s-cl-f?)

As for looking for [.eE] — we do just this on Darwin: as soon as we encounter one of those characters we avoid parsing as an integer unnecessarily, which does save some time in common situations.

Unfortunately from https://github.com/apple/swift-corelibs-foundation/blob/a2b40951e8365da696d5105fd57a19c1f1c220ef/Foundation/NSString.swift#L1237:

public convenience init?(bytesNoCopy bytes: UnsafeMutableRawPointer, length len: Int, encoding: UInt, freeWhenDone freeBuffer: Bool) /* "NoCopy" is a hint */ { // just copy for now since the internal storage will be a copy anyhow self.init(bytes: bytes, length: len, encoding: encoding) if freeBuffer { // dont take the hint free(bytes) } }

So I don't think its that useful at the moment. I will look into bypassing the integer parsing where possible

spevans · 2018-08-10T17:10:22Z

For Decimal do you mean with a change like:

                 return (NSNumber(value: uintValue), index)
             }
         }
+        let decimalNumber = NSDecimalNumber(string: string)
+        if decimalNumber.isFinite {
+            return (decimalNumber, index)
+        }
         if let doubleValue = Double(string) {
             return (NSNumber(value: doubleValue), index)
         }

?

I think #1653 is needed before Decimal works properly but I can always do it as a follow-up

itaiferber · 2018-08-10T17:35:40Z

As for Decimal, I was thinking about the heuristic I mentioned briefly in #1655. The string representation of a Double can have at most DBL_DECIMAL_DIG digits of precision (17 for an IEEE Double) before you start to lose precision. This means that if a string is longer than:

1 (if it has a sign) +
1 (if it has a decimal point and starts with a leading 0 like 0.xxxxxxxxxx) +
1 (if it has a decimal point) +
DBL_DECIMAL_DIG +
E (if it has an exponent of length E, max 5 digits for e±308)

then you will lose precision on the parse. For instance,

Strings in the form "xxxxxxxxxx": max length 17
Strings in the form "±xxxxxxxxxx": max length 18
Strings in the form "xxxxx.yyyyy": max length 18
Strings in the form "xxxxx.yyyyye±zzz": max length 23
etc.

If the string is longer than this (we're losing precision) but parsing succeeds (it's a valid Double) and the magnitude would fit in a Decimal, then it might be worth parsing as a Decimal to avoid losing that precision. (This is what we do on Darwin, at least.)

- Fully validate that the number conforms to the JSON number specification. - Determine if the number should be parsed as a UInt64, Int64 or Decimal before falling back to Decimal.

spevans · 2018-08-13T12:55:59Z

@swift-ci please test

spevans · 2018-08-14T15:16:51Z

@swift-ci please test

spevans · 2018-08-14T19:16:57Z

@swift-ci please test

spevans · 2018-08-14T21:15:20Z

@swift-ci please test

spevans · 2018-08-15T04:30:13Z

@swift-ci please test

millenomi · 2018-11-07T22:50:01Z

@swift-ci please test and merge

spevans mentioned this pull request Aug 8, 2018

JSONSerialization: Use NSDecimalNumber for parsing json numbers. #1655

Closed

spevans requested a review from itaiferber August 9, 2018 16:21

itaiferber reviewed Aug 10, 2018

View reviewed changes

JSONSerialization: Improve parsing of numbers

1e32417

- Fully validate that the number conforms to the JSON number specification. - Determine if the number should be parsed as a UInt64, Int64 or Decimal before falling back to Decimal.

Update TestJSONEncoder to account for new error message.

c69ec85

swift-ci merged commit bf3ffd2 into swiftlang:master Nov 7, 2018

florianreinhart mentioned this pull request Feb 19, 2019

Floating point numbers lose precision novi/mysql-swift#78

Open

swift-ci mentioned this pull request Nov 27, 2018

[SR-7054] JSONDecoder Decimal precision error #4255

Closed

spevans mentioned this pull request Jun 12, 2020

[SR-12974] JSONEncoder encoding non fractional double value as Int64 #3253

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JSONSerialization: Improve parsing of numbers #1657

JSONSerialization: Improve parsing of numbers #1657

spevans commented Aug 8, 2018

spevans commented Aug 8, 2018

itaiferber left a comment

itaiferber Aug 10, 2018

spevans Aug 10, 2018

itaiferber Aug 10, 2018

spevans Aug 10, 2018

spevans commented Aug 10, 2018

itaiferber commented Aug 10, 2018

spevans commented Aug 13, 2018

spevans commented Aug 14, 2018

spevans commented Aug 14, 2018

spevans commented Aug 14, 2018

spevans commented Aug 15, 2018

millenomi commented Nov 7, 2018

JSONSerialization: Improve parsing of numbers #1657

JSONSerialization: Improve parsing of numbers #1657

Conversation

spevans commented Aug 8, 2018

spevans commented Aug 8, 2018

itaiferber left a comment

Choose a reason for hiding this comment

itaiferber Aug 10, 2018

Choose a reason for hiding this comment

spevans Aug 10, 2018

Choose a reason for hiding this comment

itaiferber Aug 10, 2018

Choose a reason for hiding this comment

spevans Aug 10, 2018

Choose a reason for hiding this comment

spevans commented Aug 10, 2018

itaiferber commented Aug 10, 2018

spevans commented Aug 13, 2018

spevans commented Aug 14, 2018

spevans commented Aug 14, 2018

spevans commented Aug 14, 2018

spevans commented Aug 15, 2018

millenomi commented Nov 7, 2018