Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GeneratorEngine module with tests and macros #31

Merged
merged 14 commits into from
Nov 3, 2023
Merged

Conversation

MaxDesiatov
Copy link
Contributor

@MaxDesiatov MaxDesiatov commented Oct 25, 2023

We'd like to avoid redundant rebuilding and regeneration for inputs that don't change. The new GeneratorEngine module achieves that by introducing a Query abstraction, which can be hashed and matched to its cached outputs if any are available.

The new Engine actor can run such Query and immediately return a cached result file if one is present and its hash matches the recorded one.

This required introducing new @Query and @CacheKey macros for making hashing easy and consistent on QueryProtocol types and their inputs. We can't use the standard Hashable protocol, as that produces new hashes on every process launch due to randomized hashing seeds. With Swift Crypto's HashFunction we can generate consistent hashes and record them in SQLiteBackedCache.

To make the caching engine testable, I've introduced a new FileSystem actor protocol with VirtualFileSystem and LocalFileSystem actor implementations.

In subsequent PRs I'd like to introduce a SQLite-backed cache for storing hashes of generated artifacts. That will help us in avoiding redundant work and will make generator significantly quicker for re-runs that change only a few arguments.
@MaxDesiatov MaxDesiatov added the enhancement New feature or request label Oct 25, 2023
@MaxDesiatov MaxDesiatov self-assigned this Oct 25, 2023
@MaxDesiatov
Copy link
Contributor Author

@swift-ci test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vendored from TSC with minimal changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend to use Vapor's sqlite-nio over this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like this code to be easily portable to Windows, and IIUC sqlite-nio will make that much harder for me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vendored from TSC with minimal changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vendored from TSC with minimal changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you really need file locks? Those have very bad and annoying properties

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be a good alternative for that?

Base automatically changed from maxd/sqlite to main October 26, 2023 13:15
@MaxDesiatov
Copy link
Contributor Author

@swift-ci test

@MaxDesiatov
Copy link
Contributor Author

@swift-ci test

@MaxDesiatov
Copy link
Contributor Author

@swift-ci test

@MaxDesiatov
Copy link
Contributor Author

@swift-ci test

@MaxDesiatov
Copy link
Contributor Author

@swift-ci test

@MaxDesiatov MaxDesiatov requested a review from weissi October 26, 2023 21:46
@MaxDesiatov
Copy link
Contributor Author

MaxDesiatov commented Oct 26, 2023

Curious to hear @weissi's opinion on this too, specifically QueryProtocol, CacheKeyProtocol, and Engine, the rest of the PR is probably not too interesting. This has no proper CAS yet, just hashing of files produced by queries, but I'm making changes here as incremental as possible. We can get a local CAS-like storage eventually with more generator code rewritten with a query-based approach.


extension FilePath: CacheKeyProtocol {
public func hash(with hashFunction: inout some HashFunction) {
self.description.hash(with: &hashFunction)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.description.hash(with: &hashFunction)
self.string.hash(with: &hashFunction)


extension Optional: CacheKeyProtocol where Wrapped: CacheKeyProtocol {
public func hash(with hashFunction: inout some HashFunction) {
if let self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's not ideal. You want to do hash

hashOf(0 or false) if nil. And hashOf(1 or true) ++ hashOf(self) if not nil.

Otherwise you'll get hash unnecesary hash collisions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MaxDesiatov Does the code here address @weissi 's comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why it doesn't show "outdated" here, I've updated it this way to address the comment:

extension Optional: CacheKeyProtocol where Wrapped: CacheKeyProtocol {
  public func hash(with hashFunction: inout some HashFunction) {
    "Swift.Optional".hash(with: &hashFunction)
    if let self {
      true.hash(with: &hashFunction)
      self.hash(with: &hashFunction)
    } else {
      false.hash(with: &hashFunction)
    }
  }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that might be it, but I was having trouble following the PR history to check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend to use Vapor's sqlite-nio over this

}

deinit {
try! resultsCache.close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is fraught with peril. Not guaranteed to happen, might block etc. Would very much recommend to make the lifecycle explicit and to assert in deinit that it has been shut down.

get async throws {
var hashFunction = SHA512()
query.hash(with: &hashFunction)
let key = hashFunction.finalize().description
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.description's output is usually not API. Is this truly the right thing to do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed SQLiteBackedCache type to take some Sequence<UInt8> as keys to improved the situation. But there are still types like FileCacheRecord that are encoded into JSON and we need to serialize hashes as strings for that. But that stems from the fact that SQLiteBackedCache is a NoSQL store at this stage. I'll normalize it to make it properly relational in a future PR.

return nil
}

return .init(buffer[0..<bytesRead])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this allocates and copies

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you probably want to removeLast(n) instead

}

deinit {
try! fileDescriptor.close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a very bad idea. Managing file descriptors with deinit is all but guaranteed to break everything.

Make the lifecycle explicit or you'll suffer hard. Also close should be async.

I'd recommend try await withOpenFile(...) { file in try await file.write(...) } such that you can't forget the closes.

func read<V: Decodable>(_ path: FilePath, as: V.Type) async throws -> V {
let fileStream = try await self.read(path)
var bytes = [UInt8]()
for try await chunk in fileStream {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd ask the user to provide a limit here and maybe default it to 10 MiB or something. Limitless memory ballooning is never a good API.

bytes += chunk
}

return try decoder.decode(V.self, from: .init(bytes))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • this allocates and copies from Array -> Data. Would recommend to accumulate in Data
  • would recommend to never use .init, it obfuscates the code a lot. Some .inits allocate & copy, others don't, very very hard to see here what's going on.


func write(_ path: FilePath, _ value: some Encodable) async throws {
let data = try encoder.encode(value)
try await self.write(path, .init(data))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allocate & copy here too (might be hard to avoid)

@MaxDesiatov
Copy link
Contributor Author

MaxDesiatov commented Oct 29, 2023

macOS CI currently blocked by swiftlang/swift#69485 due to an older version of Xcode installed on CI that doesn't support extension macros, so we'll have to either force merge or disable the macOS check temporarily

@MaxDesiatov
Copy link
Contributor Author

@swift-ci test

@MaxDesiatov
Copy link
Contributor Author

@swift-ci test

@MaxDesiatov
Copy link
Contributor Author

@swift-ci test

Copy link
Contributor

@euanh euanh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've built and tested an SDK with this changeset. The non-Docker case works, and not rebuilding lld is a welcome speedup.

I'm having problems with --with-docker SDKs, both for RHEL and Ubuntu:

% swift build --experimental-swift-sdk 5.9-RELEASE_ubuntu_jammy_x86_64
Building for debugging...
warning: Could not read SDKSettings.json for SDK at: /Users/euanh/Library/org.swift.swiftpm/swift-sdks/5.9-RELEASE_ubuntu_jammy_x86_64.artifactbundle/5.9-RELEASE_ubuntu_jammy_x86_64/x86_64-unknown-linux-gnu/ubuntu-jammy.sdk
<unknown>:0: warning: glibc not found for 'x86_64-unknown-linux-gnu'; C stdlib may be unavailable
<unknown>:0: warning: glibc not found for 'x86_64-unknown-linux-gnu'; C stdlib may be unavailable
warning: Could not read SDKSettings.json for SDK at: /Users/euanh/Library/org.swift.swiftpm/swift-sdks/5.9-RELEASE_ubuntu_jammy_x86_64.artifactbundle/5.9-RELEASE_ubuntu_jammy_x86_64/x86_64-unknown-linux-gnu/ubuntu-jammy.sdk
error: link command failed with exit code 1 (use -v to see invocation)
ld.lld: error: cannot open Scrt1.o: No such file or directory
ld.lld: error: cannot open crti.o: No such file or directory
ld.lld: error: unable to find library -lc
ld.lld: error: cannot open crtn.o: No such file or directory
clang-13: error: linker command failed with exit code 1 (use -v to see invocation)
error: fatalError
[5/6] Linking hello-world

However I can reproduce these problems on main, even going back to 321e22a. There might be something wrong with my local setup, but it certainly doesn't seem to be a regression caused by this change. I'll continue investigating but I don't think it should block merging.

@MaxDesiatov MaxDesiatov merged commit a54c573 into main Nov 3, 2023
@MaxDesiatov MaxDesiatov deleted the maxd/engine branch November 3, 2023 10:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants