You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What if, in addition to text-string prompts, document-object-model elements and/or document fragments could be used as prompts? This would enable model-independent multimodal prompting in a manner intuitive to Web developers.
Such multimodal prompts could utilize <p>, <img>, <picture>, <audio>, and <video> elements; perhaps <table> and its related elements; perhaps <html>, <head>, <meta>, <link>, and <body> elements; and, perhaps, <a> elements.
For example, to provide an <img> element in a prompt, one could utilize the data URI scheme:
What if, in addition to text-string prompts, document-object-model elements and/or document fragments could be used as prompts? This would enable model-independent multimodal prompting in a manner intuitive to Web developers.
Such multimodal prompts could utilize
<p>
,<img>
,<picture>
,<audio>
, and<video>
elements; perhaps<table>
and its related elements; perhaps<html>
,<head>
,<meta>
,<link>
, and<body>
elements; and, perhaps,<a>
elements.For example, to provide an
<img>
element in a prompt, one could utilize the data URI scheme:or specify a URL:
P.S.: Other considered approaches for enabling multimodal prompts include:
MediaStream
interface could be of use for enabling voice capabilities (mentioned in #40).Clipboard
interfaces orDataTransfer
interfaces could be of use.The text was updated successfully, but these errors were encountered: