You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: README.md
+21-20
Original file line number
Diff line number
Diff line change
@@ -81,7 +81,7 @@ console.log(await session.prompt("What is your favorite food?"));
81
81
82
82
The system prompt is special, in that the language model will not respond to it, and it will be preserved even if the context window otherwise overflows due to too many calls to `prompt()`.
83
83
84
-
If the system prompt is too large, then the promise will be rejected with a `TooManyTokens` exception. See [below](#tokenization-context-window-length-limits-and-overflow) for more details on token counting and this new exception type.
84
+
If the system prompt is too large, then the promise will be rejected with a `QuotaExceededError` exception. See [below](#tokenization-context-window-length-limits-and-overflow) for more details on token counting and this new exception type.
85
85
86
86
### N-shot prompting
87
87
@@ -114,7 +114,7 @@ const result2 = await predictEmoji("This code is so good you should get promoted
114
114
Some details on error cases:
115
115
116
116
* Using both `systemPrompt` and a `{ role: "system" }` prompt in `initialPrompts`, or using multiple `{ role: "system" }` prompts, or placing the `{ role: "system" }` prompt anywhere besides at the 0th position in `initialPrompts`, will reject with a `TypeError`.
117
-
* If the combined token length of all the initial prompts (including the separate `systemPrompt`, if provided) is too large, then the promise will be rejected with a [`TooManyTokens` exception](#tokenization-context-window-length-limits-and-overflow).
117
+
* If the combined token length of all the initial prompts (including the separate `systemPrompt`, if provided) is too large, then the promise will be rejected with a [`QuotaExceededError` exception](#tokenization-context-window-length-limits-and-overflow).
118
118
119
119
### Customizing the role per prompt
120
120
@@ -389,35 +389,36 @@ Note that because sessions are stateful, and prompts can be queued, aborting a s
389
389
A given language model session will have a maximum number of tokens it can process. Developers can check their current usage and progress toward that limit by using the following properties on the session object:
390
390
391
391
```js
392
-
console.log(`${session.tokenCount} tokens used. ${session.tokensAvailable} tokens still left.`);
392
+
console.log(`${session.inputUsage} tokens used, out of ${session.inputQuota} tokens available.`);
393
393
```
394
394
395
-
To know how many tokens a string will consume, without actually processing it, developers can use the `countTokens()` method:
395
+
To know how many tokens a string will consume, without actually processing it, developers can use the `measureInputUsage()` method:
* We do not expose the actual tokenization to developers since that would make it too easy to depend on model-specific details.
404
404
* Implementations must include in their count any control tokens that will be necessary to process the prompt, e.g. ones indicating the start or end of the input.
405
-
* The counting process can be aborted by passing an `AbortSignal`, i.e. `session.countTokens(promptString, { signal })`.
405
+
* The counting process can be aborted by passing an `AbortSignal`, i.e. `session.measureInputUsage(promptString, { signal })`.
406
+
* We use the phrase "input usage/quota" in the API, to avoid being specific to the current language model tokenization paradigm. In the future, even if we change paradigms, we anticipate some concept of usage and quota still being applicable.
406
407
407
-
It's possible to send a prompt that causes the context window to overflow. That is, consider a case where `session.countTokens(promptString) > session.tokensAvailable` before calling `session.prompt(promptString)`, and then the web developer calls `session.prompt(promptString)` anyway. In such cases, the initial portions of the conversation with the language model will be removed, one prompt/response pair at a time, until enough tokens are available to process the new prompt. The exception is the [system prompt](#system-prompts), which is never removed.
408
+
It's possible to send a prompt that causes the context window to overflow. That is, consider a case where `session.measureInputUsage(promptString) > session.inputQuota - session.inputUsage` before calling `session.prompt(promptString)`, and then the web developer calls `session.prompt(promptString)` anyway. In such cases, the initial portions of the conversation with the language model will be removed, one prompt/response pair at a time, until enough tokens are available to process the new prompt. The exception is the [system prompt](#system-prompts), which is never removed.
408
409
409
-
Such overflows can be detected by listening for the `"overflow"` event on the session:
410
+
Such overflows can be detected by listening for the `"quotaoverflow"` event on the session:
410
411
411
412
```js
412
-
session.addEventListener("overflow", () => {
413
-
console.log("Context overflow!");
413
+
session.addEventListener("quotaoverflow", () => {
414
+
console.log("We've gone past the quota, and some inputs will be dropped!");
414
415
});
415
416
```
416
417
417
-
If it's not possible to remove enough tokens from the conversation history to process the new prompt, then the `prompt()` or `promptStreaming()` call will fail with a `TooManyTokens` exception and nothing will be removed. A `TooManyTokens` exception is a new type of exception, which subclasses `DOMException`, and adds the following additional properties:
418
+
If it's not possible to remove enough tokens from the conversation history to process the new prompt, then the `prompt()` or `promptStreaming()` call will fail with a `QuotaExceededError` exception and nothing will be removed. This is a proposed new type of exception, which subclasses `DOMException`, and replaces the web platform's existing `"QuotaExceededError"``DOMException`. See [whatwg/webidl#1465](https://github.com/whatwg/webidl/pull/1465) for this proposal. For our purposes, the important part is that it has the following properties:
418
419
419
-
*`tokenCount`: how many tokens the input consists of
420
-
*`tokensAvailable`: how many tokens were available (which will be less than `tokenCount`, and equal to the value of `session.tokensAvailable` at the time of the call)
420
+
*`requested`: how many tokens the input consists of
421
+
*`quota`: how many tokens were available (which will be less than `requested`, and equal to the value of `session.inputQuota - session.inputUsage` at the time of the call)
421
422
422
423
### Multilingual content and expected languages
423
424
@@ -533,8 +534,6 @@ It is also nicely future-extensible by adding more events and properties to the
533
534
Finally, note that there is a sort of precedent in the (never-shipped) [`FetchObserver` design](https://github.com/whatwg/fetch/issues/447#issuecomment-281731850).
0 commit comments