[FR] Document Object Model Integration #70

AdamSobieski · 2025-01-09T04:22:11Z

What if, in addition to text-string prompts, document-object-model elements and/or document fragments could be used as prompts? This would enable model-independent multimodal prompting in a manner intuitive to Web developers.

Such multimodal prompts could utilize <p>, <img>, <picture>, <audio>, and <video> elements; perhaps <table> and its related elements; perhaps <html>, <head>, <meta>, <link>, and <body> elements; and, perhaps, <a> elements.

For example, to provide an <img> element in a prompt, one could utilize the data URI scheme:

const fragment = document.createDocumentFragment();
const img = document.createElement("img");
img.setAttribute("src", "data:image/png;base64,...");
fragment.append(img);

// ...

const result = await session.prompt(fragment);

or specify a URL:

const fragment = document.createDocumentFragment();
const img = document.createElement("img");
img.setAttribute("src", "http://www.example.com/images/123.png");
fragment.append(img);

// ...

const result = await session.prompt(fragment);

P.S.: Other considered approaches for enabling multimodal prompts include:

the MediaStream interface could be of use for enabling voice capabilities (mentioned in #40).
the Clipboard interfaces or DataTransfer interfaces could be of use.
1. capabilities for exchanging media streams could be added to existing capabilities for exchanging data and files.

The text was updated successfully, but these errors were encountered:

Closes #40. Somewhat helps with #70.

domenic added a commit that referenced this issue Jan 20, 2025

Add image and audio prompting API

ff96dc3

Closes #40. Somewhat helps with #70.

domenic added a commit that referenced this issue Jan 20, 2025

Add image and audio prompting API

Loading
Loading status checks…

2a9f391

Closes #40. Somewhat helps with #70.

domenic mentioned this issue Jan 20, 2025

Add image and audio prompting API #71

Merged

domenic added the enhancement label Jan 23, 2025

domenic added a commit that referenced this issue Feb 25, 2025

Add image and audio prompting API

331914a

Closes #40. Somewhat helps with #70.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FR] Document Object Model Integration #70

[FR] Document Object Model Integration #70

AdamSobieski commented Jan 9, 2025 •

edited

Loading

[FR] Document Object Model Integration #70

[FR] Document Object Model Integration #70

Comments

AdamSobieski commented Jan 9, 2025 • edited Loading

AdamSobieski commented Jan 9, 2025 •

edited

Loading