Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preventing fingerprinting via detecting the presence of downloaded language packs #3

Open
domenic opened this issue Apr 24, 2024 · 7 comments

Comments

@domenic
Copy link
Collaborator

domenic commented Apr 24, 2024

This issue is for discussing questions noted in the explainer about preventing fingerprinting via detecting the presence of downloaded language packs.

The Chrome team is currently investigating whether an approach such as the following would work for a "v0": each origin can see at most two languages as supported, the user's language and the site's detected language. We tell the truth about these two languages, but this should not give significant fingerprinting info, especially since it will vary across sites.

The details of making this abuse-resistant can be a bit tricky, e.g. to avoid an origin creating one invisible iframe for each language. But we think it should be doable. It might involve expanding from origin- to site-scoping.

This is quite limiting for some use cases, so we're interested in using some of the other mitigations mentioned in the explainer to expand beyond that. But we're waiting to see how much feedback we get on these limitations first.

@annevk
Copy link

annevk commented May 7, 2024

As it's quite easy to get many unique sites using various free hosting services, wouldn't this be easily circumvented?

@domenic
Copy link
Collaborator Author

domenic commented May 7, 2024

Those sites would need to coordinate among each other within the same browsing session. I think that should not be possible if we restrict to top-level navigables with no opener. (E.g. by requiring COOP.) That's a pretty strong restriction however, in particular disallowing iframed widgets, so we might want to explore other ways of loosening it when possible...

@annevk
Copy link

annevk commented May 7, 2024

Can't they just navigate and include the information in the URL?

@domenic
Copy link
Collaborator Author

domenic commented May 7, 2024

But that doesn't allow them to join user identity; there's no way to say whether it's the same user navigating or a different user.

@annevk
Copy link

annevk commented May 8, 2024

That statement has a lot of underlying assumptions that don't currently hold I think.

@sandstrom
Copy link

sandstrom commented Jan 17, 2025

You've probably thought about it, but on desktop the user agent language isn't always "the user's language". For example, may companies will install English Chrome but users may actually speak French or Mandarin. There may also be e.g. OS in one language and browser in another.

For mobile devices OS language (user-agent lang) tend to map 1:1 with actual spoken language.

Detecting the site's language (e.g. <html lang=… should be robust, in the sense that the web developer usually better know what language the user speaks (by asking via a dropdown etc), and will have configured lang appropriately.

Maybe other packs will still be downloadable, and in that case this won't be an issue. But just wanted to mention it, in case the plan was also to limit translation to this detected pair.

(from "the user's language and the site's detected language" above)

Overall I'm very positive to a Web Translation API! 🎉

domenic added a commit to webmachinelearning/writing-assistance-apis that referenced this issue Mar 19, 2025
This includes a couple updates to the algorithms, for download status masking and avoiding actual download cancelation, plus extensive discussion of those mitigations and others in the new "Privacy considerations" section.

See webmachinelearning/translation-api#3 and webmachinelearning/translation-api#10.
domenic added a commit to webmachinelearning/writing-assistance-apis that referenced this issue Mar 19, 2025
This includes a couple updates to the algorithms, for download status masking and avoiding actual download cancelation, plus extensive discussion of those mitigations and others in the new "Privacy considerations" section.

See webmachinelearning/translation-api#3 and webmachinelearning/translation-api#10.
domenic added a commit to webmachinelearning/writing-assistance-apis that referenced this issue Mar 24, 2025
This includes a couple updates to the algorithms, for download status masking and avoiding actual download cancelation, plus extensive discussion of those mitigations and others in the new "Privacy considerations" section.

See webmachinelearning/translation-api#3 and webmachinelearning/translation-api#10.
domenic added a commit to webmachinelearning/writing-assistance-apis that referenced this issue Mar 24, 2025
This includes a couple updates to the algorithms, for download status masking and avoiding actual download cancelation, plus extensive discussion of those mitigations and others in the new "Privacy considerations" section.

See webmachinelearning/translation-api#3 and webmachinelearning/translation-api#10.
@domenic
Copy link
Collaborator Author

domenic commented Mar 24, 2025

I've posted webmachinelearning/writing-assistance-apis#47, which goes into more detail on the current mitigations we're planning, and mandates most of them as "should"-level requirements. Direct link to PR preview.

Even though it is located in the writing assistance APIs spec, these privacy and security considerations are written to apply across all built-in AI APIs, including translator, and to tackle this issue specifically. So, after that gets reviewed and merged, the plan is to close this issue by updating the translator spec to refer to that section.

Any thoughts are appreciated!

domenic added a commit to webmachinelearning/writing-assistance-apis that referenced this issue Mar 25, 2025
This includes a couple updates to the algorithms, for download status masking and avoiding actual download cancelation, plus extensive discussion of those mitigations and others in the new "Privacy considerations" section.

See webmachinelearning/translation-api#3 and webmachinelearning/translation-api#10.
domenic added a commit to webmachinelearning/writing-assistance-apis that referenced this issue Mar 25, 2025
This includes a couple updates to the algorithms, for download status masking and avoiding actual download cancelation, plus extensive discussion of those mitigations and others in the new "Privacy considerations" section.

See webmachinelearning/translation-api#3 and webmachinelearning/translation-api#10.
domenic added a commit to webmachinelearning/writing-assistance-apis that referenced this issue Mar 25, 2025
This includes a couple updates to the algorithms, for download status masking and avoiding actual download cancelation, plus extensive discussion of those mitigations and others in the new "Privacy considerations" section.

See webmachinelearning/translation-api#3 and webmachinelearning/translation-api#10.
domenic added a commit to webmachinelearning/writing-assistance-apis that referenced this issue Mar 27, 2025
This includes a couple updates to the algorithms, for download status masking and avoiding actual download cancelation, plus extensive discussion of those mitigations and others in the new "Privacy considerations" section.

See webmachinelearning/translation-api#3 and webmachinelearning/translation-api#10.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants