Support for guidance/structured output with prompt API #35

sushraja-msft · 2024-08-22T19:32:32Z

To aid programmability, reduce compatibility risk from the API returning different results across browser, avoid challenges in updating a shipping model in the browser (Google Model V1 to Google Model V2), please consider adding techniques like guidance, structured outputs as an integral part of the prompt API.

Problem Illustration

Consider the following web developer scenarios, where a developer is:

Classifying product review as the user types, to ask follow-up questions.
Building a chat bot and would like to programmatically detect if a question should be routed a particular way.
Building a reading comprehension assistive extension, that poses questions based on the web page content.

Web developers who attempt to parse the response are going to have a hard time writing code that is model/browser agnostic.

Constraining Output

One way to solve this problem is to use guidance or techniques like it. At a high level these techniques work by restricting the next allowed token from the LLM to conform to a grammar. Guidance works on top of a model, is model agnostic and only changes logits from the last layer of a model before sampling. There is an additional implementation detail within guidance in that information about all possible tokens prefixed with the next possible token is required for it to function (explanation).

With guidance (demo) we get better consistency across models and responses that are immediately parseable with JavaScript.

Proposal

The proposal is to add responseJsonSchema to the AIAssistantPromptOptions.

dictionary AIAssistantPromptOptions { AbortSignal signal; DomString? responseJsonSchema; };

JSON schema is familiar to web developers. However, JSON schema is a super set of what techniques like guidance can achieve today. For example, parts of the schema to enforce JSON schema constraints like dependent required cannot be enforced.
Either the API can state that only Property Name, Value Type, Enum, Arrays would be enforced, or Prompt API should validate the response with a JSON schema validator and indicate that the response is non conformant. Slight preference to the first option because of its simplicity.

Other Approaches

Llama.cpp supports GBNF as a grammar to restrict LLM response - link
Open AI recently added JSON schema support to restrict output. [Structured Outputs] (https://openai.com/index/introducing-structured-outputs-in-the-api/)
Exposing LogProbs and Token information from the model, I would like to acknowledge that this issue is in similar vein to Option to get logprobs #20. The difference being the consideration to make output containment a part of the API.

The text was updated successfully, but these errors were encountered:

domenic · 2024-08-30T20:00:14Z

In general we're excited about exploring this. Minor API surface nitpicks:

There's no need to have it be nullable; all dictionary entries are already optional.
Per https://w3ctag.github.io/design-principles/#casing-rules it should be something like responseJSONSchema, not responseJsonSchema
I think providing JSON as a string is pretty unusual, even though I understand it makes sense theoretically. I would suggest we take it as an object and then post-process it. Probably we would do the equivalent of: JSON.stringify(providedObject) -> pass the resulting JSON string to some JSON schema library. This feels a bit roundabout but I suspect for developer ergonomics it's way better.

So to summarize: object responseJSONSchema in the dictionary.

rhys101 · 2024-11-05T16:16:08Z

Agree with @sushraja-msft that having structured output really helps with ensuring a correct response.

For reference, we've implemented a parallel implementation of the Prompt API as an extension based on llama.cpp, available at GitHub. We've exposed a grammer object within the implementation that can be passed into the create function.

An example use:

sess = await window.aibrow.languageModel.create({
 grammar:{
  "type": "object",
  "properties": {
    "first_name": {
      "type": "string"
    },
    "last_name": {
      "type": "string"
    },
    "country": {
      "type": "string"
    }
  }
}
})
const stream = await sess.promptStreaming("Extract data from the following text: 'John Doe is an innovative software developer with a passion for creating intuitive user experiences. Based in the heart of England, John has spent the past decade refining his craft, working with both startups and established tech companies. His deep commitment to quality and creativity is evident in the numerous award-winning apps he has developed, which continue to enrich the digital lives of users worldwide. Beyond his technical skills, John is admired for his collaborative spirit and mentorship, always eager to share his knowledge and inspire the next generation of tech enthusiasts.'");
for await (const chunk of stream) {
  console.log(chunk)
}

Having experienced quite a few inconsistencies before when trying to "plead with the prompt" to get it to only output JSON (where it often tries to wrap it in markdown), a constraining structured output seems like the best approach.

domenic added enhancement New feature or request interop Potential concerns about interoperability among multiple implementations of the API labels Oct 9, 2024

domenic mentioned this issue Dec 3, 2024

JSON mode #65

Closed

sushraja-msft mentioned this issue Jan 27, 2025

Add structured outputs #78

Merged

domenic closed this as completed in cbd111e Feb 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for guidance/structured output with prompt API #35

Support for guidance/structured output with prompt API #35

sushraja-msft commented Aug 22, 2024 •

edited

Loading

domenic commented Aug 30, 2024

rhys101 commented Nov 5, 2024

Support for guidance/structured output with prompt API #35

Support for guidance/structured output with prompt API #35

Comments

sushraja-msft commented Aug 22, 2024 • edited Loading

Problem Illustration

Constraining Output

Proposal

Other Approaches

domenic commented Aug 30, 2024

rhys101 commented Nov 5, 2024

sushraja-msft commented Aug 22, 2024 •

edited

Loading