You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<start><user>Generate some JSON<assistant>{"foo":"Donec porttitor consectetur nulla vel feugiat.","bar":"Aliquam non elementum quam.","baz":"Praesent consectetur
Right now, as far as I can tell, grammars are always evaluated relative to the first generated token. What I'm looking for is a way to say, "Start evaluating the grammar after '<assistant>'." The idea is to fast-forward through the schema using the existing tokens, picking up with the next token to be generated.
Using the above example, it would be great if it were possible to "complete" the "baz" field.
This would be useful to, e.g. resume an interrupted response with a large complicated JSON schema, or use the LLM to "fill in" the rest of a partially-formed JSON object.
I'm specifically interested in using this with LLGuidance, but maybe it would also be generalizable to llama.cpp's built-in grammar system.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Would it be possible to allow partial matches to LLGuidance grammars, starting from a specified token?
By way of example:
Grammar:
Prompt:
Right now, as far as I can tell, grammars are always evaluated relative to the first generated token. What I'm looking for is a way to say, "Start evaluating the grammar after '
<assistant>
'." The idea is to fast-forward through the schema using the existing tokens, picking up with the next token to be generated.Using the above example, it would be great if it were possible to "complete" the "
baz
" field.This would be useful to, e.g. resume an interrupted response with a large complicated JSON schema, or use the LLM to "fill in" the rest of a partially-formed JSON object.
I'm specifically interested in using this with LLGuidance, but maybe it would also be generalizable to llama.cpp's built-in grammar system.
Beta Was this translation helpful? Give feedback.
All reactions