Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSONSerialization: Improve parsing of numbers #1657

Merged
merged 3 commits into from
Nov 7, 2018
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 52 additions & 34 deletions Foundation/JSONSerialization.swift
Original file line number Diff line number Diff line change
Expand Up @@ -822,46 +822,64 @@ private struct JSONReader {
]

func parseNumber(_ input: Index, options opt: JSONSerialization.ReadingOptions) throws -> (Any, Index)? {
func parseTypedNumber(_ address: UnsafePointer<UInt8>, count: Int) -> (Any, IndexDistance)? {
let temp_buffer_size = 64
var temp_buffer = [Int8](repeating: 0, count: temp_buffer_size)
return temp_buffer.withUnsafeMutableBufferPointer { (buffer: inout UnsafeMutableBufferPointer<Int8>) -> (Any, IndexDistance)? in
memcpy(buffer.baseAddress!, address, min(count, temp_buffer_size - 1)) // ensure null termination

let startPointer = buffer.baseAddress!
let intEndPointer = UnsafeMutablePointer<UnsafeMutablePointer<Int8>?>.allocate(capacity: 1)
defer { intEndPointer.deallocate() }
let doubleEndPointer = UnsafeMutablePointer<UnsafeMutablePointer<Int8>?>.allocate(capacity: 1)
defer { doubleEndPointer.deallocate() }
let intResult = strtol(startPointer, intEndPointer, 10)
let intDistance = startPointer.distance(to: intEndPointer[0]!)
let doubleResult = strtod(startPointer, doubleEndPointer)
let doubleDistance = startPointer.distance(to: doubleEndPointer[0]!)

guard doubleDistance > 0 else { return nil }
if intDistance == doubleDistance {
return (NSNumber(value: intResult), intDistance)
let ZERO = UInt8(ascii: "0")
let ONE = UInt8(ascii: "1")
let NINE = UInt8(ascii: "9")
let MINUS = UInt8(ascii: "-")

var isNegative = false
var string = ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we try to preserve the UTF-8 behavior here of walking a pointer along, rather than appending a single character at a time here? (Given that the vast majority of JSON provided is given to us in UTF-8, it'd be nice to maintain the performance there.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did think about this but strings on x86_64/ARM64 upto 15 ASCII characters should actually fit into a SSO so avoid a memory allocation which probably covers most numbers to be parsed.

The other issue is that strings passed to Int64(), UInt64() and Double() cant have any invalid trailing characters so this avoids creating a string of all the available characters using String(bytesNoCopy:) which I believe still get validated according to the encoding which could end up reading through the whole of the rest of the JSON document and then scanning through it to determine the new shorter count.

As an performance enhancement, when validating the characters its possible to count the number of integers and look for .eE and directly jump to parsing as a Double()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking more along the lines of String.init(bytesNoCopy:length:encoding:freeWhenDone:) which would allow us to avoid reading to the end of the document and wouldn't necessitate doing any further validation. Might be worth doing some small perf tests, just to see. (Or is this not available in s-cl-f?)

As for looking for [.eE] — we do just this on Darwin: as soon as we encounter one of those characters we avoid parsing as an integer unnecessarily, which does save some time in common situations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately from https://github.com/apple/swift-corelibs-foundation/blob/a2b40951e8365da696d5105fd57a19c1f1c220ef/Foundation/NSString.swift#L1237:

public convenience init?(bytesNoCopy bytes: UnsafeMutableRawPointer, length len: Int, encoding: UInt, freeWhenDone freeBuffer: Bool) /* "NoCopy" is a hint */ {
        // just copy for now since the internal storage will be a copy anyhow
        self.init(bytes: bytes, length: len, encoding: encoding)
        if freeBuffer { // dont take the hint
            free(bytes)
        }
    }

So I don't think its that useful at the moment. I will look into bypassing the integer parsing where possible


// Validate the first few characters look like a JSON encoded number:
// Optional '-' sign at start only 1 leading zero if followed by a decimal point.
var index = input
func nextASCII() -> UInt8? {
guard let (ascii, nextIndex) = source.takeASCII(index),
JSONReader.numberCodePoints.contains(ascii) else { return nil }
index = nextIndex
return ascii
}

guard var ascii = nextASCII() else { return nil }
guard ascii == MINUS || (ascii >= ZERO && ascii <= NINE) else { return nil }
if ascii == MINUS {
string = "-"
isNegative = true
guard let d = nextASCII() else { return nil }
ascii = d
}

if ascii == ZERO {
if let ascii2 = nextASCII() {
if ascii2 >= ZERO && ascii2 <= NINE {
throw NSError(domain: NSCocoaErrorDomain, code: CocoaError.propertyListReadCorrupt.rawValue,
userInfo: ["NSDebugDescription" : "Leading zeros not allowed at character \(input)." ])
}
return (NSNumber(value: doubleResult), doubleDistance)
string.append("0")
ascii = ascii2
}
} else if ascii < ONE || ascii > NINE {
throw NSError(domain: NSCocoaErrorDomain, code: CocoaError.propertyListReadCorrupt.rawValue,
userInfo: ["NSDebugDescription" : "Numbers must start with a 1-9 at character \(input)." ])
}

if source.encoding == .utf8 {
return parseTypedNumber(source.buffer.baseAddress!.advanced(by: input), count: source.buffer.count - input).map { return ($0.0, input + $0.1) }
string.append(Character(UnicodeScalar(ascii)))
while let ascii = nextASCII() {
string.append(Character(UnicodeScalar(ascii)))
}
else {
var numberCharacters = [UInt8]()
var index = input
while let (ascii, nextIndex) = source.takeASCII(index), JSONReader.numberCodePoints.contains(ascii) {
numberCharacters.append(ascii)
index = nextIndex

if isNegative {
if let intValue = Int64(string) {
return (NSNumber(value: intValue), index)
}
} else {
if let uintValue = UInt64(string) {
return (NSNumber(value: uintValue), index)
}
numberCharacters.append(0)

return numberCharacters.withUnsafeBufferPointer {
parseTypedNumber($0.baseAddress!, count: $0.count)
}.map { return ($0.0, index) }
}
if let doubleValue = Double(string) {
return (NSNumber(value: doubleValue), index)
}
return nil
}

//MARK: - Value parsing
Expand Down
99 changes: 99 additions & 0 deletions TestFoundation/TestJSONEncoder.swift
Original file line number Diff line number Diff line change
Expand Up @@ -499,6 +499,104 @@ class TestJSONEncoder : XCTestCase {
}
}

func test_numericLimits() {
struct DataStruct: Codable {
let int8Value: Int8?
let uint8Value: UInt8?
let int16Value: Int16?
let uint16Value: UInt16?
let int32Value: Int32?
let uint32Value: UInt32?
let int64Value: Int64?
let intValue: Int?
let uintValue: UInt?
let uint64Value: UInt64?
let floatValue: Float?
let doubleValue: Double?
}

func decode(_ type: String, _ value: String) throws {
var key = type.lowercased()
key.append("Value")
_ = try JSONDecoder().decode(DataStruct.self, from: "{ \"\(key)\": \(value) }".data(using: .utf8)!)
}

func testGoodValue(_ type: String, _ value: String) {
do {
try decode(type, value)
} catch {
XCTFail("Unexpected error: \(error) for parsing \(value) to \(type)")
}
}

func testErrorThrown(_ type: String, _ value: String, errorMessage: String) {
do {
try decode(type, value)
XCTFail("Decode of \(value) to \(type) should not succeed")
} catch DecodingError.dataCorrupted(let context) {
XCTAssertEqual(context.debugDescription, errorMessage)
} catch {
XCTAssertEqual(String(describing: error), errorMessage)
}
}


var goodValues = [
("Int8", "0"), ("Int8", "1"), ("Int8", "-1"), ("Int8", "-128"), ("Int8", "127"),
("UInt8", "0"), ("UInt8", "1"), ("UInt8", "255"), ("UInt8", "-0"),

("Int16", "0"), ("Int16", "1"), ("Int16", "-1"), ("Int16", "-32768"), ("Int16", "32767"),
("UInt16", "0"), ("UInt16", "1"), ("UInt16", "65535"), ("UInt16", "34.0"),

("Int32", "0"), ("Int32", "1"), ("Int32", "-1"), ("Int32", "-2147483648"), ("Int32", "2147483647"),
("UInt32", "0"), ("UInt32", "1"), ("UInt32", "4294967295"),

("Int64", "0"), ("Int64", "1"), ("Int64", "-1"), ("Int64", "-9223372036854775808"), ("Int64", "9223372036854775807"),
("UInt64", "0"), ("UInt64", "1"), ("UInt64", "18446744073709551615"),
]

if Int.max == Int64.max {
goodValues += [
("Int", "0"), ("Int", "1"), ("Int", "-1"), ("Int", "-9223372036854775808"), ("Int", "9223372036854775807"),
("UInt", "0"), ("UInt", "1"), ("UInt", "18446744073709551615"),
]
} else {
goodValues += [
("Int", "0"), ("Int", "1"), ("Int", "-1"), ("Int", "-2147483648"), ("Int", "2147483647"),
("UInt", "0"), ("UInt", "1"), ("UInt", "4294967295"),
]
}

let badValues = [
("Int8", "-129"), ("Int8", "128"), ("Int8", "1.2"),
("UInt8", "-1"), ("UInt8", "256"),

("Int16", "-32769"), ("Int16", "32768"),
("UInt16", "-1"), ("UInt16", "65536"),

("Int32", "-2147483649"), ("Int32", "2147483648"),
("UInt32", "-1"), ("UInt32", "4294967296"),

("Int64", "9223372036854775808"), ("Int64", "9223372036854775808"), ("Int64", "-100000000000000000000"),
("UInt64", "-1"), ("UInt64", "18446744073709600000"), ("Int64", "10000000000000000000000000000000000000"),
]

for value in goodValues {
testGoodValue(value.0, value.1)
}

for (type, value) in badValues {
testErrorThrown(type, value, errorMessage: "Parsed JSON number <\(value)> does not fit in \(type).")
}

// Leading zeros are invalid
testErrorThrown("Int8", "0000000000000000000000000000001", errorMessage: "The operation could not be completed")
testErrorThrown("Double", "-.1", errorMessage: "The operation could not be completed")
testErrorThrown("Int32", "+1", errorMessage: "The operation could not be completed")
testErrorThrown("Int", ".012", errorMessage: "The operation could not be completed")
}


// MARK: - Helper Functions
private var _jsonEmptyDictionary: Data {
return "{}".data(using: .utf8)!
Expand Down Expand Up @@ -1089,6 +1187,7 @@ extension TestJSONEncoder {
("test_codingOfDouble", test_codingOfDouble),
("test_codingOfString", test_codingOfString),
("test_codingOfURL", test_codingOfURL),
("test_numericLimits", test_numericLimits),
]
}
}