Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SR-10374] URLCache: init method and first time sqlite database setup implemented #2000

Conversation

karthikkeyan
Copy link
Contributor

@karthikkeyan karthikkeyan commented Mar 13, 2019

  • init method implemented
  • URLCache.shared singleton object created with 4MB of memory space and 20MB of disk space
  • Directory and database file Cache.db file created under the local directory
  • Sqlite Tables and Indices created in the database
  • Unit tests added for URLCache to verify directory, file, tables, and indices

Bug Link: https://bugs.swift.org/browse/SR-10374

@karthikkeyan karthikkeyan force-pushed the karthik/urlcache-database-setup branch from 2ccc7c0 to 7793c62 Compare March 14, 2019 21:35
@karthikkeyan karthikkeyan changed the title URLCache: init method and first time sqlite database setup implemented [SR-10374] URLCache: init method and first time sqlite database setup implemented Apr 9, 2019
@karthikkeyan karthikkeyan force-pushed the karthik/urlcache-database-setup branch from 7793c62 to be57ae3 Compare April 9, 2019 21:49
@pushkarnk
Copy link
Member

@karthikkeyan Apologies for the very late response on this PR. I think implementations of URLCache and URLCredentialStorage have been long pending and this looks like a great start! However, I've a few questions about this pull request:

  1. Is SQLite available on Linux and Windows? I guess not.
  2. Have we considered other storage alternatives?
    I came across this old discussion on URLCache . You might find it relevant.

@karthikkeyan
Copy link
Contributor Author

@karthikkeyan Apologies for the very late response on this PR. I think implementations of URLCache and URLCredentialStorage have been long pending and this looks like a great start! However, I've a few questions about this pull request:

  1. Is SQLite available on Linux and Windows? I guess not.
  2. Have we considered other storage alternatives?
    I came across this old discussion on URLCache . You might find it relevant.

I have failed to give a fair bit of a thought process about the Linux platform. Thanks for reminding of it, I will follow the discussion and see how best we can move this PR forward.

@karthikkeyan karthikkeyan force-pushed the karthik/urlcache-database-setup branch from be57ae3 to ca9d675 Compare April 26, 2019 21:00
@karthikkeyan
Copy link
Contributor Author

@pushkarnk I have reimplemented store and retrieve logic using NSKeyedArchiver and NSKeyedUnarchiver. This implementation is not perfect but I think it would be a good start.

@karthikkeyan karthikkeyan force-pushed the karthik/urlcache-database-setup branch 2 times, most recently from 8875d8f to 0f21c7c Compare April 26, 2019 21:07
@spevans
Copy link
Contributor

spevans commented Apr 28, 2019

@swift-ci test

Copy link
Contributor

@drodriguez drodriguez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Full disclosure: I didn’t know about this PR, and I started working on my own URLCache (that's the reason for #2245). I have some parts that are present in this version (with some differences), but I focused in the memory cache first. I have ideas for the disk cache, but that's more complicated. I can also have a lot more feedback about things I know that will need to be handle (for example skipping some protocols from being cached). Are you still planning in working on this?


sharedSyncQ.sync {
// Remove previous data before resetting new URLCache instance
URLCache.sharedCache?.persistence?.deleteAllCaches()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the original implementation do this? Feels weird removing “user” data (even if it is a cache), just by assigning another value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not.


let path: String

var cacheSizeInDesk: Int {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: cacheSizeInDisk

let attributes = try FileManager.default.attributesOfItem(atPath: path.appending(fileName))
total += (attributes[.size] as? Int) ?? 0
}
return total
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there plan to keep this value updated as cached responses are added and evicted? Feels like with a big cache and a slow disk, each of these calls might take a while.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me add a FIXME comment, I will create a separate PR for optimizing this. My main focus in this PR is to create an initial setup and kick start the implementation. Hope that's ok.

func setupCacheDirectory() {
do {
if FileManager.default.fileExists(atPath: path) {
try FileManager.default.removeItem(atPath: path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn’t this delete the previous cache contents?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed.

}

private func urlForFileIdentifier(_ identifier: String) -> URL {
return URL(fileURLWithPath: path + "/" + identifier)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to use URL(fileURLWithPath: path, isDirectory: true).appendingPathComponent(identifier) to handle platforms with other directory separators.

}

let method = httpMethod ?? "GET"
return urlString + method
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is used as part of the path, shoulnd’t the output be escaped or hashed somehow?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed.

@karthikkeyan
Copy link
Contributor Author

Full disclosure: I didn’t know about this PR, and I started working on my own URLCache (that's the reason for #2245). I have some parts that are present in this version (with some differences), but I focused in the memory cache first. I have ideas for the disk cache, but that's more complicated. I can also have a lot more feedback about things I know that will need to be handle (for example skipping some protocols from being cached). Are you still planning in working on this?

@drodriguez Sorry for the delay response, yes I will be working on this PR. I was just waiting for someone to review this. Please give me your suggestions.

Copy link
Contributor

@drodriguez drodriguez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will see if I submit my memory cache side, since the two are complementary (but we overlap in some points in which we already differ).

} else {
let fourMegaByte = 4 * 1024 * 1024
let twentyMegaByte = 20 * 1024 * 1024
let bundleIdentifier = Bundle.main.bundleIdentifier ?? UUID().uuidString
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That UUID at the end is a little bit worrying. The app will fill caches, and will never reuse the cached results, filling the disk of garbage. I can think at least of ProcessInfo.processInfo.processName as a good fallback, but it is not unique (but the bundle identifier doesn’t have to be either). Maybe if no bundle identifier is found the best thing is not to have a disk cache at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this point to the temporary directory with a fixed name? (A sanitized process name does sound good to me!)

let fourMegaByte = 4 * 1024 * 1024
let twentyMegaByte = 20 * 1024 * 1024
let bundleIdentifier = Bundle.main.bundleIdentifier ?? UUID().uuidString
let cacheDirectoryPath = FileManager.default.urls(for: .cachesDirectory, in: .userDomainMask).first?.path ?? "\(NSHomeDirectory())/Library/Caches"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same idea here. If we don’t find a good URL for .cachesDirectory maybe it is better to not have disk cache.

The fallback has the problem of being very Darwin specific. For example NSHomeDirectory() in Android is the root directory, where one cannot write at all. Also, it might want to use appendingPathComponent(…).appendingPathComponent(…) to be nicer to Windows.

(There's another case below where the POSIX path separator is used instead of a higher level function).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The home directory should be in a persistent writable path everywhere, much as possible. I know that Android is really weird about this, but that looks like incorrect mapping to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, please do not fall back like this. If FileManager does not return a caches URL, consider not caching at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will remove the fallback if FileManager for cachesDirectory returned nill.

self.memoryCapacity = memoryCapacity
self.diskCapacity = diskCapacity

if let _path = path {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might also want to check that diskCapacity is more than zero.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Darwin indeed does accept 0 arguments. My assumption is that memory capacity 0 means that as much as possible no data is kept in memory, and disk capacity 0 that no data is written to disk.


fileprivate extension URLRequest {

var cacheFileIdentifier: String {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a slightly different design here (I use a URLCacheKey type, instead of String), but one important thing is that generating a key from the URLRequest depends on some details from the request beside the URL itself. So my equivalent method returns an optional.

The first thing is the protocol. Protocols like data: or file: should not be cached for obvious reasons. I also think unknown protocols should not be cached either.

Also, for known protocols like http: or https: (and maybe for ftp:, but I don’t know that protocol that well), we should do a couple more things. For example:

  • I don’t think you can do a HTTP request without a method, so that's a return nil in the case the protocol is HTTP.
  • I think some headers should be taken into account. The spec refers to User-Agent, but others like Accept-Language and Range might be interesting.
  • I think the hash of the body contents should also be used.
  • I also think one should not try to cache request with body streams, because that will consume it, which is not good.
  • Finally I would hash everything and return an hexadecimal string, to have a valid file name.

} else {
let fourMegaByte = 4 * 1024 * 1024
let twentyMegaByte = 20 * 1024 * 1024
let bundleIdentifier = Bundle.main.bundleIdentifier ?? UUID().uuidString
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this point to the temporary directory with a fixed name? (A sanitized process name does sound good to me!)

let fourMegaByte = 4 * 1024 * 1024
let twentyMegaByte = 20 * 1024 * 1024
let bundleIdentifier = Bundle.main.bundleIdentifier ?? UUID().uuidString
let cacheDirectoryPath = FileManager.default.urls(for: .cachesDirectory, in: .userDomainMask).first?.path ?? "\(NSHomeDirectory())/Library/Caches"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, please do not fall back like this. If FileManager does not return a caches URL, consider not caching at all.


sharedSyncQ.sync {
// Remove previous data before resetting new URLCache instance
URLCache.sharedCache?.persistence?.deleteAllCaches()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not.

self.memoryCapacity = memoryCapacity
self.diskCapacity = diskCapacity

if let _path = path {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Darwin indeed does accept 0 arguments. My assumption is that memory capacity 0 means that as much as possible no data is kept in memory, and disk capacity 0 that no data is written to disk.

func setupCacheDirectory() {
do {
if FileManager.default.fileExists(atPath: path) {
try FileManager.default.removeItem(atPath: path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed.

}

let method = httpMethod ?? "GET"
return urlString + method
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed.

@millenomi
Copy link
Contributor

I want to shepherd this patch into the repository. Do you need help with changes here?

@karthikkeyan
Copy link
Contributor Author

I want to shepherd this patch into the repository. Do you need help with changes here?

@millenomi I am addressing all the comments added by you and Danial. I will let you know if I need any help. Thanks Lily.

@karthikkeyan karthikkeyan force-pushed the karthik/urlcache-database-setup branch from e1aa5c5 to cd746f3 Compare May 25, 2019 23:28
@karthikkeyan
Copy link
Contributor Author

karthikkeyan commented May 25, 2019

@millenomi @drodriguez I have addressed all the comments. I still have some doubts about the implementation of the property cacheFileIdentifier, please let me know if there are any changes needed.

@millenomi
Copy link
Contributor

@swift-ci please test

return nil
}

var hashString = "\(scheme)_\(method)_\(urlString)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The collision space of the default hash is smaller than you probably think. Can we at least check when we load a request that this is actually the request we were looking for?


let url = urlForFileIdentifier(fileIdentifier)
guard let data = try? Data(contentsOf: url),
let response = NSKeyedUnarchiver.unarchiveObject(with: data) as? CachedURLResponse else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use secure coding here.

Copy link
Contributor

@millenomi millenomi May 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To use secure coding, just invoke archivedData(withRootObject:requiringSecureCoding:) and unarchivedObject(of:from:) instead. You should've gotten deprecation warnings for these calls.

}

func saveCachedResponse(_ response: CachedURLResponse, for request: URLRequest) {
let archivedData = NSKeyedArchiver.archivedData(withRootObject: response)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is slightly less urgent, but I'd like you to use secure coding wherever you use coding.

do {
try FileManager.default.createDirectory(atPath: path, withIntermediateDirectories: true)
} catch {
fatalError("Unable to create directories for URLCache: \(error.localizedDescription)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not introduce fatal errors for things the user may not be able to control. As I asked elsewhere: consider not loading if we do not have a suitable directory.

@millenomi
Copy link
Contributor

I'm going to commandeer this patch; I may drop the SQLite dependency, though.

@karthikkeyan
Copy link
Contributor Author

I'm going to commandeer this patch; I may drop the SQLite dependency, though.

@millenomi I have pushed one more commit addressing the comments related to secure archiving. Also, I have updated the hashing mechanism for generating cacheFileIdentifier.

The one thing I am struggling though is to write a unit test for the internal property cacheFileIdentifier

* init method implemented
* URLCache.shared singleton object created with 4MB of memory space and 20MB of disk space
* Directory and database file Cache.db file created under local directory
* Sqlite Tables and Indices created in database
* Unit tests added for URLCache to verify directory, file, tables and indices
@karthikkeyan karthikkeyan force-pushed the karthik/urlcache-database-setup branch from d8e1009 to b5dafd2 Compare June 1, 2019 04:47
@karthikkeyan
Copy link
Contributor Author

Closing this PR as URLCache is implemented in this PR #2401

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants