Does GitHub Copilot use any code from individual users to train GitHub's model (or any successor model)? #152229

dlindmark · 2025-02-24T06:44:37Z

dlindmark
Feb 24, 2025

I have read the FAQ and I think that it is very clear when it comes to Business and Enterprise users. Not as clear when it comes to Pro and free users.

For example Does GitHub Copilot use any of your code to train the GitHub's model (or any successor model)? -> No. GitHub uses neither Copilot Business nor Enterprise data to train the GitHub model. But nothing about free or pro users.

When I look at my own Copilot policies (I am a pro user)

(clicking at that link do not take me to any privacy policy, just to some random doc page)

and reading the copilot documentation here It seems VERY clear that data (such as prompts which could include IP data) is NEVER used to train any AI-model.

However, when reading this article it explicitly says

Copilot for Business also does not use your code to train the Azure OpenAI model

and

GitHub Copilot for Individual users, however, can opt in and explicitly provide consent for their code to be used as training data. User engagement data is used to improve the performance of the Copilot Service; specifically, it’s used to fine-tune ranking, sort algorithms, and craft prompts.

Then it sound like it is possible for individual users to opt-in to for the code (which should mean prompts?) to be used as training data (to train the Azure OpenAI model). Even though copilot documentation says that is never done.

So my question include

What model is the Azure OpenAI model? Is that the model I choose to chat with? or is it something else?
As an individual user. Is it at any time possible that github or third parties use prompts or suggestions to train any AI-model?

abdeliibrahim · 2025-03-06T16:00:48Z

abdeliibrahim
Mar 6, 2025

https://docs.github.com/en/copilot/managing-copilot/managing-copilot-as-an-individual-subscriber/managing-copilot-policies-as-an-individual-subscriber#model-training-and-improvements

1 reply

dlindmark Mar 10, 2025
Author

Absolutely. I link to the same section twice in my question. What gets me confused is the contradicting statements from other official github sources.

znorris · 2025-03-14T15:53:06Z

znorris
Mar 14, 2025

I just came here looking for the same answers. Why isn't pro listed on the trust page? Why would I receive less privacy protections for being an individual who purchases the same product?

0 replies

haiqu91 · 2025-03-15T14:09:48Z

haiqu91
Mar 15, 2025

I'll be honest, I'm already at the point that I simply don't trust these tools on principle, and the launch of a new license type without comprehensive documentation updates explicitly referring to the new license type feels like GitHub, MS, or both (yes, I know MS owns GH) are attempting to purposefully obfuscate their policies so they have an escape hatch when someone realizes that their local code is, in fact, not being protected.

If our data is being kept private, it should not be difficult to explicitly say so. Since GitHub cannot seem to do that, we have to assume that Copilot is harvesting just as much data from Pro licensees as Free ones. And until the official documentation makes the policy explicit and consistent from document to document, I can only conclude that it is either misleading entirely on purpose, or that GitHub is not run well enough to keep such information accurate.

Either way, it's an unacceptable state of affairs, and genuinely makes me reconsider hosting anything here (not that I have much as it is).

Get it together, GitHub. This is not how such an important pillar in the development world should be run.

0 replies

SilverFuzz · 2025-03-17T22:09:58Z

SilverFuzz
Mar 17, 2025

Very important question that requires an answer. If sll code is harvested, the days of credit algorithm invention are over. If you innovate FFT, bubble sort, backprop or some other gamechsnging algorithm, it will be trivially available ho all of humanity with zero cred to the inventor.

1 reply

dabockster Jun 21, 2025

What about filing for your own patent and/or promoting it somewhere super visible? Like on YouTube or Instagram?

I was on the fence about using online AI for a long time. But after hearing how Meta supposedly pirated a bunch of books to train their Llama series of AI models, I realized that you have effectively ZERO control over anything you put online if it's on a system other than your own. Now this doesn't mean you should self-host everything and become a recluse, but it does mean that you need to plan for that loss of control should you choose to use cloud services in general (especially cloud AI). And if you do self-host on your own infra, be prepared to develop blocks or even countermeasures against entities (beyond the obvious stuff like robots.txt) that you don't want to have access to your stuff.

Since I'm not building ACM/IEEE tier algorithms or a massive multi billion dollar company, I came to the conclusion that the benefits outweighed the level of control I'd effectively lose. But that was my own personal decision, and you all will have to make your own decisions as well.

queenofcorgis · 2025-05-09T19:33:11Z

queenofcorgis
May 9, 2025
Maintainer

Hi from GitHub here 👋🏼 ,

As a GitHub Copilot Pro user you can opt-out of having your prompts and suggestions used for AI training. The Managing Copilot doc walks through how to disable that setting.

Also I wanted to call from the doc and confirm this is accurate despite conflicting information you may have seen elsewhere:

By default, GitHub, its affiliates, and third parties will not use your data, including prompts, suggestions, and code snippets, for AI model training. This is reflected in your personal settings for GitHub Copilot and cannot be enabled.

The Azure OpenAI model mentioned in the article is out of date. We now host a variety of models and if you have opted out as a GitHub Copilot Individual user of your prompts being used to train the model, it will apply to all the models that we offer today.

1 reply

steinhh Nov 3, 2025

Paranoid, but:

if you have opted out as a GitHub Copilot Individual user of your prompts being used to train the model, it will apply to all the models that we offer today

It seems you're saying "you cannot opt in" ("...and cannot be enabled") but also "you can opt out".

Is the latter supposed to be about "Allow GitHub to use my data for product improvements", a setting which can be turned off.

SomeOne-0o0 · 2025-06-21T21:22:12Z

SomeOne-0o0
Jun 21, 2025

No, GitHub Copilot does not use your private code to train its models (unless you explicitly opted in). As of now, the models like Codex or successors are trained on publicly available data, including public GitHub repos.

If you're using Copilot in a private repo, your code might be used to improve the product, but not for training the underlying model, unless you’ve opted in through your settings.

You can double-check or change that under:
Settings > GitHub Copilot > Data Sharing

Hope that clears it up! 😊 @dlindmark

2 replies

rafamerlin Jul 14, 2025

Could you elaborate on the "improve the product" aspect? Just knowing if we accepted or rejected some suggestion from github or something else besides that?

nuttyartist Jul 16, 2025

@SomeOne-0o0 Can you please clarify on "your code might be used to improve the product", what product, in what way? Is there a way to opt-out?

Thanks.

Vaccano · 2025-07-21T18:25:16Z

Vaccano
Jul 21, 2025

This page needs an authoritative response from someone who is clearly associated with the project.

Though what is REALLY needed is for the FAQ to be updated to include the Pro and Free versions of the product. (Something official from the product to let me know it is safe to purchase the Pro version or use the free version.)

(Lack of this is stopping GitHub Copilot adoption at my company.)

0 replies

haiqu91 · 2025-07-21T18:39:07Z

haiqu91
Jul 21, 2025

This thread was opened in February and absolutely nothing has changed. It’s absolutely unacceptable but unsurprising giving how much these models have relied on theft to begin with. Appalling.

0 replies

ichDaheim · 2025-08-04T23:00:46Z

ichDaheim
Aug 4, 2025

Well - let me put it this way: the documentation is quite clear about the privacy of business and enterprise users code - every other individual pricing plan is not so direct.

So please assume you are manager on the run to earn billions of dollars with your product and you now 2 things.

your product will only earn money if you meet the needs of the companies who are willing to pay (maybe a lot) for it.
individuals are weak and have no legal power (or money/will to envorce their rights)

What would be your conclusions if your paycheck on the end of the year is bound to the earnings of the product?

Are you being clear and honest about the terms and conditions of your product – and risk losing valuable data to make your product even more valuable to (business) customers?
obfuscate the facts as much as possible to earn your bonus and climb the career ladder?

Disclaimer: This is, of course, only a hypothetical consideration. Any similarity to possible reasons given by any existing personalities is purely coincidental and unintentional.

0 replies

kyle-chine-leismore · 2025-08-14T03:16:32Z

kyle-chine-leismore
Aug 14, 2025

Another question is:

If I saved my database user and password in some file in my codebase, will this file be uploaded and shared with someone else?

0 replies

ichDaheim · 2025-08-19T21:06:39Z

ichDaheim
Aug 19, 2025

If you are not a business or enterprise user - the answer is most likely: it’s already part of the training data.

0 replies

Does GitHub Copilot use any code from individual users to train GitHub's model (or any successor model)? #152229

Uh oh!

Uh oh!

Replies: 11 comments · 5 replies

Uh oh!

Uh oh!

dlindmark Mar 10, 2025 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

queenofcorgis May 9, 2025 Maintainer

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Replies: 11 comments 5 replies

dlindmark Mar 10, 2025
Author

queenofcorgis
May 9, 2025
Maintainer