Does GitHub Copilot use any code from individual users to train GitHub's model (or any successor model)? #152229
Replies: 11 comments 5 replies
-
Beta Was this translation helpful? Give feedback.
-
|
I just came here looking for the same answers. Why isn't pro listed on the trust page? Why would I receive less privacy protections for being an individual who purchases the same product? |
Beta Was this translation helpful? Give feedback.
-
|
I'll be honest, I'm already at the point that I simply don't trust these tools on principle, and the launch of a new license type without comprehensive documentation updates explicitly referring to the new license type feels like GitHub, MS, or both (yes, I know MS owns GH) are attempting to purposefully obfuscate their policies so they have an escape hatch when someone realizes that their local code is, in fact, not being protected. If our data is being kept private, it should not be difficult to explicitly say so. Since GitHub cannot seem to do that, we have to assume that Copilot is harvesting just as much data from Pro licensees as Free ones. And until the official documentation makes the policy explicit and consistent from document to document, I can only conclude that it is either misleading entirely on purpose, or that GitHub is not run well enough to keep such information accurate. Either way, it's an unacceptable state of affairs, and genuinely makes me reconsider hosting anything here (not that I have much as it is). Get it together, GitHub. This is not how such an important pillar in the development world should be run. |
Beta Was this translation helpful? Give feedback.
-
|
Very important question that requires an answer. If sll code is harvested, the days of credit algorithm invention are over. If you innovate FFT, bubble sort, backprop or some other gamechsnging algorithm, it will be trivially available ho all of humanity with zero cred to the inventor. |
Beta Was this translation helpful? Give feedback.
-
|
Hi from GitHub here 👋🏼 , As a GitHub Copilot Pro user you can opt-out of having your prompts and suggestions used for AI training. The Managing Copilot doc walks through how to disable that setting. Also I wanted to call from the doc and confirm this is accurate despite conflicting information you may have seen elsewhere:
The Azure OpenAI model mentioned in the article is out of date. We now host a variety of models and if you have opted out as a GitHub Copilot Individual user of your prompts being used to train the model, it will apply to all the models that we offer today. |
Beta Was this translation helpful? Give feedback.
-
|
No, GitHub Copilot does not use your private code to train its models (unless you explicitly opted in). As of now, the models like Codex or successors are trained on publicly available data, including public GitHub repos. If you're using Copilot in a private repo, your code might be used to improve the product, but not for training the underlying model, unless you’ve opted in through your settings. You can double-check or change that under: Hope that clears it up! 😊 @dlindmark |
Beta Was this translation helpful? Give feedback.
-
|
This page needs an authoritative response from someone who is clearly associated with the project. Though what is REALLY needed is for the FAQ to be updated to include the Pro and Free versions of the product. (Something official from the product to let me know it is safe to purchase the Pro version or use the free version.) (Lack of this is stopping GitHub Copilot adoption at my company.) |
Beta Was this translation helpful? Give feedback.
-
|
This thread was opened in February and absolutely nothing has changed. It’s absolutely unacceptable but unsurprising giving how much these models have relied on theft to begin with. Appalling. |
Beta Was this translation helpful? Give feedback.
-
|
Well - let me put it this way: the documentation is quite clear about the privacy of business and enterprise users code - every other individual pricing plan is not so direct. So please assume you are manager on the run to earn billions of dollars with your product and you now 2 things.
What would be your conclusions if your paycheck on the end of the year is bound to the earnings of the product?
Disclaimer: This is, of course, only a hypothetical consideration. Any similarity to possible reasons given by any existing personalities is purely coincidental and unintentional. |
Beta Was this translation helpful? Give feedback.
-
|
Another question is: If I saved my database user and password in some file in my codebase, will this file be uploaded and shared with someone else? |
Beta Was this translation helpful? Give feedback.
-
|
If you are not a business or enterprise user - the answer is most likely: it’s already part of the training data. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I have read the FAQ and I think that it is very clear when it comes to Business and Enterprise users. Not as clear when it comes to Pro and free users.
For example Does GitHub Copilot use any of your code to train the GitHub's model (or any successor model)? -> No. GitHub uses neither Copilot Business nor Enterprise data to train the GitHub model. But nothing about free or pro users.
When I look at my own Copilot policies (I am a pro user)
(clicking at that link do not take me to any privacy policy, just to some random doc page)
and reading the copilot documentation here It seems VERY clear that data (such as prompts which could include IP data) is NEVER used to train any AI-model.
However, when reading this article it explicitly says
and
Then it sound like it is possible for individual users to opt-in to for the code (which should mean prompts?) to be used as training data (to train the Azure OpenAI model). Even though copilot documentation says that is never done.
So my question include
Beta Was this translation helpful? Give feedback.
All reactions