Skip to content

Commit ffd5aec

Browse files
committed
Update token-counting and context overflow API surface
This is mostly to align with webmachinelearning/writing-assistance-apis#31: * Use TooManyTokensError when appropriate, instead of a "QuotaExceededError" DOMException. * Rename tokensLeft/tokensSoFar to tokensAvailable/tokenCount. * Rename countPromptTokens() to countTokens(). * Remove maxTokens. * Rename "contextoverflow" to "overflow", since the vocabulary "context" is not used elsewhere in the API and there's only one relevant type of overflow for the session.
1 parent 2dd11f5 commit ffd5aec

File tree

1 file changed

+23
-33
lines changed

1 file changed

+23
-33
lines changed

README.md

+23-33
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ console.log(await session.prompt("What is your favorite food?"));
8181

8282
The system prompt is special, in that the language model will not respond to it, and it will be preserved even if the context window otherwise overflows due to too many calls to `prompt()`.
8383

84-
If the system prompt is too large (see [below](#tokenization-context-window-length-limits-and-overflow)), then the promise will be rejected with a `"QuotaExceededError"` `DOMException`.
84+
If the system prompt is too large, then the promise will be rejected with a `TooManyTokens` exception. See [below](#tokenization-context-window-length-limits-and-overflow) for more details on token counting and this new exception type.
8585

8686
### N-shot prompting
8787

@@ -114,7 +114,7 @@ const result2 = await predictEmoji("This code is so good you should get promoted
114114
Some details on error cases:
115115

116116
* Using both `systemPrompt` and a `{ role: "system" }` prompt in `initialPrompts`, or using multiple `{ role: "system" }` prompts, or placing the `{ role: "system" }` prompt anywhere besides at the 0th position in `initialPrompts`, will reject with a `TypeError`.
117-
* If the combined token length of all the initial prompts (including the separate `systemPrompt`, if provided) is too large, then the promise will be rejected with a `"QuotaExceededError"` `DOMException`.
117+
* If the combined token length of all the initial prompts (including the separate `systemPrompt`, if provided) is too large, then the promise will be rejected with a [`TooManyTokens` exception](#tokenization-context-window-length-limits-and-overflow).
118118

119119
### Customizing the role per prompt
120120

@@ -295,31 +295,36 @@ Note that because sessions are stateful, and prompts can be queued, aborting a s
295295
A given language model session will have a maximum number of tokens it can process. Developers can check their current usage and progress toward that limit by using the following properties on the session object:
296296

297297
```js
298-
console.log(`${session.tokensSoFar}/${session.maxTokens} (${session.tokensLeft} left)`);
298+
console.log(`${session.tokenCount} tokens used. ${session.tokensAvailable} tokens still left.`);
299299
```
300300

301-
To know how many tokens a string will consume, without actually processing it, developers can use the `countPromptTokens()` method:
301+
To know how many tokens a string will consume, without actually processing it, developers can use the `countTokens()` method:
302302

303303
```js
304-
const numTokens = await session.countPromptTokens(promptString);
304+
const numTokens = await session.countTokens(promptString);
305305
```
306306

307307
Some notes on this API:
308308

309309
* We do not expose the actual tokenization to developers since that would make it too easy to depend on model-specific details.
310310
* Implementations must include in their count any control tokens that will be necessary to process the prompt, e.g. ones indicating the start or end of the input.
311-
* The counting process can be aborted by passing an `AbortSignal`, i.e. `session.countPromptTokens(promptString, { signal })`.
311+
* The counting process can be aborted by passing an `AbortSignal`, i.e. `session.countTokens(promptString, { signal })`.
312312

313-
It's possible to send a prompt that causes the context window to overflow. That is, consider a case where `session.countPromptTokens(promptString) > session.tokensLeft` before calling `session.prompt(promptString)`, and then the web developer calls `session.prompt(promptString)` anyway. In such cases, the initial portions of the conversation with the language model will be removed, one prompt/response pair at a time, until enough tokens are available to process the new prompt. The exception is the [system prompt](#system-prompts), which is never removed. If it's not possible to remove enough tokens from the conversation history to process the new prompt, then the `prompt()` or `promptStreaming()` call will fail with an `"QuotaExceededError"` `DOMException` and nothing will be removed.
313+
It's possible to send a prompt that causes the context window to overflow. That is, consider a case where `session.countTokens(promptString) > session.tokensAvailable` before calling `session.prompt(promptString)`, and then the web developer calls `session.prompt(promptString)` anyway. In such cases, the initial portions of the conversation with the language model will be removed, one prompt/response pair at a time, until enough tokens are available to process the new prompt. The exception is the [system prompt](#system-prompts), which is never removed.
314314

315-
Such overflows can be detected by listening for the `"contextoverflow"` event on the session:
315+
Such overflows can be detected by listening for the `"overflow"` event on the session:
316316

317317
```js
318-
session.addEventListener("contextoverflow", () => {
318+
session.addEventListener("overflow", () => {
319319
console.log("Context overflow!");
320320
});
321321
```
322322

323+
If it's not possible to remove enough tokens from the conversation history to process the new prompt, then the `prompt()` or `promptStreaming()` call will fail with a `TooManyTokens` exception and nothing will be removed. A `TooManyTokens` exception is a new type of exception, which subclasses `DOMException`, and adds the following additional properties:
324+
325+
* `tokenCount`: how many tokens the input consists of
326+
* `tokensAvailable`: how many tokens were available (which will be less than `tokenCount`, and equal to the value of `session.tokensAvailable` at the time of the call)
327+
323328
### Multilingual content and expected languages
324329

325330
The default behavior for a language model session assumes that the input languages are unknown. In this case, implementations will use whatever "base" capabilities they have available for the language model, and might throw `"NotSupportedError"` `DOMException`s if they encounter languages they don't support.
@@ -398,33 +403,19 @@ It is also nicely future-extensible by adding more events and properties to the
398403
Finally, note that there is a sort of precedent in the (never-shipped) [`FetchObserver` design](https://github.com/whatwg/fetch/issues/447#issuecomment-281731850).
399404
</details>
400405

406+
### Too-large inputs
407+
401408
## Detailed design
402409

403410
### Full API surface in Web IDL
404411

405412
```webidl
406-
// Shared self.ai APIs
413+
// Shared self.ai APIs:
414+
// See https://webmachinelearning.github.io/writing-assistance-apis/#shared-ai-api for most of them.
407415
408-
partial interface WindowOrWorkerGlobalScope {
409-
[Replaceable, SecureContext] readonly attribute AI ai;
410-
};
411-
412-
[Exposed=(Window,Worker), SecureContext]
413-
interface AI {
416+
partial interface AI {
414417
readonly attribute AILanguageModelFactory languageModel;
415418
};
416-
417-
[Exposed=(Window,Worker), SecureContext]
418-
interface AICreateMonitor : EventTarget {
419-
attribute EventHandler ondownloadprogress;
420-
421-
// Might get more stuff in the future, e.g. for
422-
// https://github.com/webmachinelearning/prompt-api/issues/4
423-
};
424-
425-
callback AICreateMonitorCallback = undefined (AICreateMonitor monitor);
426-
427-
enum AICapabilityAvailability { "readily", "after-download", "no" };
428419
```
429420

430421
```webidl
@@ -442,16 +433,15 @@ interface AILanguageModel : EventTarget {
442433
Promise<DOMString> prompt(AILanguageModelPromptInput input, optional AILanguageModelPromptOptions options = {});
443434
ReadableStream promptStreaming(AILanguageModelPromptInput input, optional AILanguageModelPromptOptions options = {});
444435
445-
Promise<unsigned long long> countPromptTokens(AILanguageModelPromptInput input, optional AILanguageModelPromptOptions options = {});
446-
readonly attribute unsigned long long maxTokens;
447-
readonly attribute unsigned long long tokensSoFar;
448-
readonly attribute unsigned long long tokensLeft;
436+
Promise<unsigned long long> countTokens(AILanguageModelPromptInput input, optional AILanguageModelPromptOptions options = {});
437+
readonly attribute unsigned long long tokensAvailable;
438+
readonly attribute unsigned long long tokenCount;
449439
450440
readonly attribute unsigned long topK;
451441
readonly attribute float temperature;
452442
readonly attribute FrozenArray<DOMString>? expectedInputLanguages;
453443
454-
attribute EventHandler oncontextoverflow;
444+
attribute EventHandler onoverflow;
455445
456446
Promise<AILanguageModel> clone(optional AILanguageModelCloneOptions options = {});
457447
undefined destroy();

0 commit comments

Comments
 (0)