-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSONSerialization: Improve parsing of numbers #1657
Conversation
- Check the number looks like a JSON number and exit early if not. - Use the native Int64(), UInt64(), Double() parsers to avoid creating a C string and passing to strtol()/strtod(). This also eliminates a memcpy() and removes the 63 digit restriction which would fail to parse numbers expressible by Double's full exponent. - For numbers with a leading '-' sign, parse using Int64() falling back to Double(), otherwise parse using UInt64() failling back to Double().
@swift-ci please test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a reasonable change to me. Do we want to apply our efforts in also trying to parse Decimal
s here?
Foundation/JSONSerialization.swift
Outdated
let MINUS = UInt8(ascii: "-") | ||
|
||
var isNegative = false | ||
var string = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we try to preserve the UTF-8 behavior here of walking a pointer along, rather than appending a single character at a time here? (Given that the vast majority of JSON provided is given to us in UTF-8, it'd be nice to maintain the performance there.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did think about this but strings on x86_64/ARM64 upto 15 ASCII characters should actually fit into a SSO so avoid a memory allocation which probably covers most numbers to be parsed.
The other issue is that strings passed to Int64()
, UInt64()
and Double()
cant have any invalid trailing characters so this avoids creating a string of all the available characters using String(bytesNoCopy:)
which I believe still get validated according to the encoding which could end up reading through the whole of the rest of the JSON document and then scanning through it to determine the new shorter count.
As an performance enhancement, when validating the characters its possible to count the number of integers and look for .eE
and directly jump to parsing as a Double()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking more along the lines of String.init(bytesNoCopy:length:encoding:freeWhenDone:)
which would allow us to avoid reading to the end of the document and wouldn't necessitate doing any further validation. Might be worth doing some small perf tests, just to see. (Or is this not available in s-cl-f?)
As for looking for [.eE]
— we do just this on Darwin: as soon as we encounter one of those characters we avoid parsing as an integer unnecessarily, which does save some time in common situations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately from https://github.com/apple/swift-corelibs-foundation/blob/a2b40951e8365da696d5105fd57a19c1f1c220ef/Foundation/NSString.swift#L1237:
public convenience init?(bytesNoCopy bytes: UnsafeMutableRawPointer, length len: Int, encoding: UInt, freeWhenDone freeBuffer: Bool) /* "NoCopy" is a hint */ {
// just copy for now since the internal storage will be a copy anyhow
self.init(bytes: bytes, length: len, encoding: encoding)
if freeBuffer { // dont take the hint
free(bytes)
}
}
So I don't think its that useful at the moment. I will look into bypassing the integer parsing where possible
For
? I think #1653 is needed before |
As for
then you will lose precision on the parse. For instance,
If the string is longer than this (we're losing precision) but parsing succeeds (it's a valid |
- Fully validate that the number conforms to the JSON number specification. - Determine if the number should be parsed as a UInt64, Int64 or Decimal before falling back to Decimal.
@swift-ci please test |
@swift-ci please test |
1 similar comment
@swift-ci please test |
@swift-ci please test and merge |
Check the number looks like a JSON number and exit early if not.
Use the native Int64(), UInt64(), Double() parsers to avoid creating
a C string and passing to strtol()/strtod(). This also eliminates a
memcpy() and removes the 63 digit restriction which would fail to
parse numbers expressible by Double's full exponent.
For numbers with a leading '-' sign, parse using Int64() falling
back to Double(), otherwise parse using UInt64() failling back to
Double().