Preventing fingerprinting via detecting the presence of downloaded language packs #3

domenic · 2024-04-24T05:40:18Z

This issue is for discussing questions noted in the explainer about preventing fingerprinting via detecting the presence of downloaded language packs.

The Chrome team is currently investigating whether an approach such as the following would work for a "v0": each origin can see at most two languages as supported, the user's language and the site's detected language. We tell the truth about these two languages, but this should not give significant fingerprinting info, especially since it will vary across sites.

The details of making this abuse-resistant can be a bit tricky, e.g. to avoid an origin creating one invisible iframe for each language. But we think it should be doable. It might involve expanding from origin- to site-scoping.

This is quite limiting for some use cases, so we're interested in using some of the other mitigations mentioned in the explainer to expand beyond that. But we're waiting to see how much feedback we get on these limitations first.

annevk · 2024-05-07T06:36:48Z

As it's quite easy to get many unique sites using various free hosting services, wouldn't this be easily circumvented?

domenic · 2024-05-07T08:15:33Z

Those sites would need to coordinate among each other within the same browsing session. I think that should not be possible if we restrict to top-level navigables with no opener. (E.g. by requiring COOP.) That's a pretty strong restriction however, in particular disallowing iframed widgets, so we might want to explore other ways of loosening it when possible...

annevk · 2024-05-07T10:43:17Z

Can't they just navigate and include the information in the URL?

domenic · 2024-05-07T23:24:48Z

But that doesn't allow them to join user identity; there's no way to say whether it's the same user navigating or a different user.

annevk · 2024-05-08T06:32:38Z

That statement has a lot of underlying assumptions that don't currently hold I think.

sandstrom · 2025-01-17T12:03:51Z

You've probably thought about it, but on desktop the user agent language isn't always "the user's language". For example, may companies will install English Chrome but users may actually speak French or Mandarin. There may also be e.g. OS in one language and browser in another.

For mobile devices OS language (user-agent lang) tend to map 1:1 with actual spoken language.

Detecting the site's language (e.g. <html lang=… should be robust, in the sense that the web developer usually better know what language the user speaks (by asking via a dropdown etc), and will have configured lang appropriately.

Maybe other packs will still be downloadable, and in that case this won't be an issue. But just wanted to mention it, in case the plan was also to limit translation to this detected pair.

(from "the user's language and the site's detected language" above)

Overall I'm very positive to a Web Translation API! 🎉

This includes a couple updates to the algorithms, for download status masking and avoiding actual download cancelation, plus extensive discussion of those mitigations and others in the new "Privacy considerations" section. See webmachinelearning/translation-api#3 and webmachinelearning/translation-api#10.

domenic · 2025-03-24T05:05:38Z

I've posted webmachinelearning/writing-assistance-apis#47, which goes into more detail on the current mitigations we're planning, and mandates most of them as "should"-level requirements. Direct link to PR preview.

Even though it is located in the writing assistance APIs spec, these privacy and security considerations are written to apply across all built-in AI APIs, including translator, and to tackle this issue specifically. So, after that gets reviewed and merged, the plan is to close this issue by updating the translator spec to refer to that section.

Any thoughts are appreciated!

This includes a couple updates to the algorithms, for download status masking and avoiding actual download cancelation, plus extensive discussion of those mitigations and others in the new "Privacy considerations" section. See webmachinelearning/translation-api#3 and webmachinelearning/translation-api#10.

domenic mentioned this issue Apr 24, 2024

Web Translation API w3ctag/design-reviews#948

Closed

1 task

This was referenced May 7, 2024

Highlighting Privacy and User Consent Concerns #5

Closed

Web Translation API WebKit/standards-positions#339

Open

martinthomson mentioned this issue May 16, 2024

Fake downloads, fingerprinting, cross-site covert channels #10

Open

josephrocca mentioned this issue Nov 30, 2024

Usage in iframes #25

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preventing fingerprinting via detecting the presence of downloaded language packs #3

Preventing fingerprinting via detecting the presence of downloaded language packs #3

domenic commented Apr 24, 2024

annevk commented May 7, 2024

domenic commented May 7, 2024

annevk commented May 7, 2024

domenic commented May 7, 2024

annevk commented May 8, 2024

sandstrom commented Jan 17, 2025 •

edited

Loading

domenic commented Mar 24, 2025

Preventing fingerprinting via detecting the presence of downloaded language packs #3

Preventing fingerprinting via detecting the presence of downloaded language packs #3

Comments

domenic commented Apr 24, 2024

annevk commented May 7, 2024

domenic commented May 7, 2024

annevk commented May 7, 2024

domenic commented May 7, 2024

annevk commented May 8, 2024

sandstrom commented Jan 17, 2025 • edited Loading

domenic commented Mar 24, 2025

sandstrom commented Jan 17, 2025 •

edited

Loading