diff --git a/README.md b/README.md index 7e52d39..610883c 100644 --- a/README.md +++ b/README.md @@ -417,29 +417,11 @@ For the time being, the Chrome built-in AI team is moving forward more aggresive ## Privacy considerations -### General concerns about language-model based APIs +Please see [the specification](https://webmachinelearning.github.io/writing-assistance-apis/#privacy). -If cloud-based language models are exposed through this API, then there are potential privacy issues with exposing user or website data to the relevant cloud and model providers. This is not a concern specific to this API, as websites can already choose to expose user or website data to other origins using APIs such as `fetch()`. However, it's worth keeping in mind, and in particular as discussed in our [Goals](#shared-goals), perhaps we should make it easier for web developers to know whether a cloud-based model is in use, or which one. +## Security considerations -If on-device language models are updated separately from browser and operating system versions, this API could enhance the web's fingerprinting service by providing extra identifying bits. Mandating that older browser versions not receive updates or be able to download models from too far into the future might be a possible remediation for this. - -Finally, we intend to prohibit (in the specification) any use of user-specific information that is not directly supplied through the API. For example, it would not be permissible to fine-tune the language model based on information the user has entered into the browser in the past. - -### Detecting available options - -The [`availability()` API](#testing-available-options-before-creation) specified here provide some bits of fingerprinting information, since the availability status of each option and language can be one of four values, and those values are expected to be shared across a user's browser or browsing profile. In theory, this could be up to ~6.6 bits for the current set of summarizer options, plus an unknown number more based on the number of supported languages, and then this would be roughly tripled by including writer and rewriter. - -In practice, we expect the number of bits to be much smaller, as implementations will likely not have separate, independently-downloadable pieces of collateral for each option value. (For example, in Chrome's case, we anticipate having a single download for all three APIs.) But we need the API design to be robust to a variety of implementation choices, and have purposefully designed it to allow such independent-download architectures so as not to lock implementers into a single strategy. - -There are a variety of solutions here, with varying tradeoffs, such as: - -* Grouping downloads to reduce the number of bits, e.g. by ensuring that downloading the "formal" tone also downloads the "neutral" and "casual" tones. This costs the user slightly more bytes, but hopefully not many. -* Partitioning downloads by top-level site, i.e. repeatedly downloading extra fine-tunings or similar and not sharing them across all sites. This could be feasible if the collateral necessary to support a given option is small; it would not generally make sense for the base language model. -* Adding friction to the download with permission prompts or other user notifications, so that sites which are attempting to use these APIs for tracking end up looking suspicious to users. - -We'll continue to investigate the best solutions here. And the specification will at a minimum allow user agents to add prompts and UI, or reject downloads entirely, as they see fit to preserve privacy. - -It's also worth noting that a download cannot be evicted by web developers. Thus the availability states can only be toggled in one direction, from `"downloadable"` to `"downloading"` to `"available"`. And it doesn't provide an identifier that is very stable over time, as by browsing other sites, users will gradually toggle more and more of the availability states to `"availale"`. +Please see [the specification](https://webmachinelearning.github.io/writing-assistance-apis/#security). ## Stakeholder feedback diff --git a/index.bs b/index.bs index aeee361..5a96ebd 100644 --- a/index.bs +++ b/index.bs @@ -425,6 +425,8 @@ The inputQuota getter steps are to return The summarization should conform to the guidance given by |type|, |format|, and |length|, in the definitions of each of their enumeration values. + The summarization process must conform to the guidance given in [[#privacy]] and [[#security]]. + If |outputLanguage| is non-null, the summarization should be in that language. Otherwise, it should be in the language of |input| (which might not match that of |context| or |sharedContext|). If |input| contains multiple languages, or the language of |input| cannot be detected, then either the output language is [=implementation-defined=], or the implementation may treat this as an error, per the guidance in [[#summarizer-errors]]. 1. While true: @@ -621,7 +623,7 @@ When summarization fails, the following possible reasons may be surfaced to the
All other scenarios, or if the user agent would prefer not to disclose the failure reason. +
All other scenarios, including if the user agent believes it is necessary to fail to meet the requirements given in [[#privacy]] or [[#security]]. Or, if the user agent would prefer not to disclose the failure reason.
This table does not give the complete list of exceptions that can be surfaced by the summarizer API. It only contains those which can come from certain [=implementation-defined=] steps. @@ -972,6 +974,8 @@ The inputQuota getter steps are to return [=th The written output should conform to the guidance given by |tone|, |format|, and |length|, in the definitions of each of their enumeration values. + The writing process must conform to the guidance given in [[#privacy]] and [[#security]]. + If |outputLanguage| is non-null, the writing should be in that language. Otherwise, it should be in the language of |input| (which might not match that of |context| or |sharedContext|). If |input| contains multiple languages, or the language of |input| cannot be detected, then either the output language is [=implementation-defined=], or the implementation may treat this as an error, per the guidance in [[#writer-errors]]. 1. While true: @@ -1130,7 +1134,7 @@ When writing fails, the following possible reasons may be surfaced to the web de
All other scenarios, or if the user agent would prefer not to disclose the failure reason. +
All other scenarios, including if the user agent believes it is necessary to fail to meet the requirements given in [[#privacy]] or [[#security]]. Or, if the user agent would prefer not to disclose the failure reason.
This table does not give the complete list of exceptions that can be surfaced by the writer API. It only contains those which can come from certain [=implementation-defined=] steps. @@ -1481,6 +1485,8 @@ The inputQuota getter steps are to return [= The rewritten output should conform to the guidance given by |tone|, |format|, and |length|, in the definitions of each of their enumeration values. + The rewriting process must conform to the guidance given in [[#privacy]] and [[#security]]. + If |outputLanguage| is non-null, the rewritten output text should be in that language. Otherwise, it should be in the language of |input| (which might not match that of |context| or |sharedContext|). If |input| contains multiple languages, or the language of |input| cannot be detected, then either the output language is [=implementation-defined=], or the implementation may treat this as an error, per the guidance in [[#rewriter-errors]]. 1. While true: @@ -1643,7 +1649,7 @@ When rewriting fails, the following possible reasons may be surfaced to the web
All other scenarios, or if the user agent would prefer not to disclose the failure reason. +
All other scenarios, including if the user agent believes it is necessary to fail to meet the requirements given in [[#privacy]] or [[#security]]. Or, if the user agent would prefer not to disclose the failure reason.
This table does not give the complete list of exceptions that can be surfaced by the rewriter API. It only contains those which can come from certain [=implementation-defined=] steps. @@ -1808,6 +1814,22 @@ Every [=interface=] [=interface/including=] the {{DestroyableModel}} interface m :: 1. If |availability| is "{{Availability/downloadable}}", then: + 1. If |realm|'s [=realm/global object=] does not have [=transient activation=], then: + + 1. [=Queue a global task=] on the [=AI task source=] given |realm|'s [=realm/global object=] to [=reject=] |promise| with a "{{NotAllowedError}}" {{DOMException}}. + + 1. Abort these steps. + + 1. [=Consume user activation=] given |realm|'s [=realm/global object=]. + + 1. The user agent may display a user interface to the user to confirm that they want to perform the download operation given by |startDownload|, or to show the progress of the download. Alternately, the user agent may decide to deny the ability to perform |startDownload| based on implicit signals of the user's intent, including the considerations in [[#security-disk-space]]. If the user explicitly or implicitly signals that they do not want to start the download, then: + + 1. [=Queue a global task=] on the [=AI task source=] given |realm|'s [=realm/global object=] to [=reject=] |promise| with a "{{NotAllowedError}}" {{DOMException}}. + + 1. Abort these steps. + +
The case where the user cancels the download after it starts is handled later, as part of the download loop. + 1. Let |startDownloadResult| be the result of performing |startDownload| given |options|. 1. If |startDownloadResult| is false, then: @@ -1828,6 +1850,8 @@ Every [=interface=] [=interface/including=] the {{DestroyableModel}} interface m
This prevents the web developer-perceived progress from suddenly jumping from 0% to 90%, and then taking a long time to go from 90% to 100%. It also provides some protection against the (admittedly not very powerful) fingerprinting vector of measuring the current download progress across multiple sites. + If the actual number of bytes necessary to download is 0, but the user agent is faking a download for the reasons described in [[#privacy]], then set this number to an [=implementation-defined=] value that helps with the download faking. + 1. Let |lastProgressFraction| be 0. 1. Let |lastProgressTime| be the [=monotonic clock=]'s [=monotonic clock/unsafe current time=]. @@ -1836,13 +1860,13 @@ Every [=interface=] [=interface/including=] the {{DestroyableModel}} interface m 1. While true: - 1. If downloading has failed, then: + 1. If downloading has failed, or the user has canceled the download, then: 1. [=Queue a global task=] on the [=AI task source=] given |realm|'s [=realm/global object=] to [=reject=] |promise| with a "{{NetworkError}}" {{DOMException}}. 1. Abort these steps. - 1. Let |bytesSoFar| be the number of bytes downloaded so far. + 1. Let |bytesSoFar| be the number of bytes downloaded so far. (Or the number of bytes fake-downloaded so far, if the user agent is faking the download.) 1. [=Assert=]: |bytesSoFar| is greater than or equal to 0, and less than or equal to |totalBytes|. @@ -1852,7 +1876,7 @@ Every [=interface=] [=interface/including=] the {{DestroyableModel}} interface m 1. Let |progressFraction| be [$floor$](|rawProgressFraction| × 65,536) ÷ 65,536. -
We use a fraction, instead of firing a progress event with the number of bytes downloaded, to avoid giving precise information about the size of the model or other material being downloaded.
|progressFraction| is calculated from |rawProgressFraction| to give a precision of one part in 216. This ensures that over most internet speeds and with most model sizes, the {{ProgressEvent/loaded}} value will be different from the previous one that was fired ~50 milliseconds ago.
@@ -1961,6 +1985,8 @@ Every [=interface=] [=interface/including=] the {{DestroyableModel}} interface m 1. Abort these steps. +The user agent should not actually cancel the underlying download, as explained in [[#privacy-availability-cancelation]]. It could fulfill this requirement by "pausing" the download, such that future calls to |getAvailability| given |options| return "{{Availability/downloading}}" instead of "{{Availability/downloadable}}", but this "pause" must persist even across user agent restarts. + 1. [=Initialize and return an AI model object=] given |promise|, |options|, a no-op algorithm, |initialize|, and |create|. @@ -2258,6 +2284,8 @@ Every [=interface=] [=interface/including=] the {{DestroyableModel}} interface m 1. Let |availability| be the result of |compute| given |options|. + 1. If |availability| is "{{Availability/available}}" or "{{Availability/downloading}}", the user agent should set |availability| to "{{Availability/downloadable}}" if doing so would improve the user's privacy, per the discussions in [[#privacy-availability-masking]]. + 1. [=Queue a global task=] on the [=AI task source=] given |global| to perform the following steps: 1. If |availability| is null, then [=reject=] |promise| with an "{{UnknownError}}" {{DOMException}}. @@ -2380,3 +2408,110 @@ A quota exceeded error information is a [=struct=] with the fo