OpenAI’s GPT-3 has taken the world of text generation by storm in recent years and has quickly become a great tool for automating text production, but a common question we hear at EML is:
Does it plagiarize?
While GPT-3 generally does not plagiarize, as the neural network uses its whole training set to create sentences, meaning it uses logic and reasoning to build sentences piece by piece instead of copying them. However, through testing, we find that GPT-3 outright takes sentences from the web.
While that second part of the sentence may be controversial, below, we’ll show tests done through software that proves that GPT-3 does, in fact, plagiarize and will show up in detection software.
Let’s jump in.
Understanding How GPT-3 Was Trained
Remember, GPT-3 is a massive machine learning system created by OpenAI.
It has been trained to understand natural language and produce coherent answers to questions it is asked.
To do this, this machine learning model has been fed BILLIONS of pieces of data from various sources, including websites, books, Wikipedia, forums, etc.
With over a billion websites on the internet, we can assume that this huge information collection was scraped from millions of websites, allowing GPT-3 to build a ginormous database of language patterns and meanings.
What’s even better is the model continuously gets better with retraining.
The combined datasets (old and new) enable GPT-3 to get smarter over time and learn more about language and the world.
As of today, here is the breakdown of the retraining schedule of GPT-3.
davinci | Jun 2021 |
curie | Oct 2019 |
babbage | Oct 2019 |
ada | Oct 2019 |
However, scraping the whole entire internet… does create some problems.
So Does GPT-3 Plagiarize?
While GPT-3 generally does not plagiarize, as the neural network uses its whole training set to create sentences, meaning it uses logic and reasoning to build sentences piece by piece instead of copying them. However, through testing, we find that GPT-3 outright takes sentences from the web.
Below, I generated 800 words (the length of a one to two-page paper) and ran it through a paid plagiarizing detection software.
The essay is on dog walking, which I randomly picked to get the point across.
Let’s see how we did.
Here’s a link to the software we used in case you want to do your own testing.
So for our 800-word paper, we saw that we had 46 words (6.3% of the total) flagged for possible plagiarism.
From our detector software, we can see exactly where these words were taken; the first (and most egregious) was taken from a pet website, where the similarity was nearly 5% (we used 5% of their article).
While the logic behind GPT-3 is solid, there does seem to be some copying when you use gpt-3 to write your essays and papers.
Ai plagiarism is a serious problem, and maybe through some prompt engineering, it can be avoided, but current software can detect gpt-3 writing.
Can Turnitin detect GPT-3?
As we’ve seen from the testing above, the software can detect something with your writing, flagging it in their system. While detecting GPT-3 as definitive AI writing is tough, you’ll most likely be flagged for plagiarism before getting caught for using an AI writing tool.
While using this text “raw” without any editing may pose some problems, you could lessen your chances of plagiarizing by putting your own spin on the generated text.
One of my favorite ways to clean up GPT-3 text is by using Grammarly.
Grammarly will scan your text, detect common spelling and sentence structure mistakes, and automatically fix them for you.
If you’re not a big fan of additional software, adding your own spin on the text can also help make it seem more original, possibly allowing it to pass plagiarism tools such as Turnitin and Copyleaks.
For example, if the GPT-3 output suggests using a certain phrase or sentence structure, you could read it, understand it, then add your expertise and human experience to the sentence.
This may include inserting synonyms, changing sentence structure, and adding a personal story making it more unique. This will make it impossible for plagiarism software to detect… because you’re no longer plagiarizing someone else’s writing.
With the right editing processes in place – including Grammarly and other software, if needed – you can get your GPT-3 writing through a plagiarism check.
GPT-3 Watermark, What to watch for
In my opinion, I already believe GPT-3 to be watermarked. Even if OpenAI claims that GPT-3 is not watermarked, there are just such obvious signs of AI-generated text in GPT-3 that common software should easily detect.
Anyone who has seen enough AI-generated text will quickly be able to tell that the text sitting in front of them isn’t quite human writing.
Not only that but a firm-wide watermark on all products has been discussed heavily by OpenAI.
It seems like they are actively introducing similar measures soon with ChatGPT – a bigger brother to GPT-3 – meaning watermarking is something they have obviously thought about and is already on their radar.
What Prompts Help GPT-3 Not Plagiarize?
In my experience, prompts don’t influence the effectiveness of whether or not GPT-3 will be detected for plagiarism.
And trust me, I’ve tried thousands of prompts.
I mean, this makes sense.
Imagine if someone tried to tell you to “write a story as if you’re not you.” – even by trying to be someone else, you’d still end up being “you”.
Despite what some marketers and internet gurus may tell you, prompts are not a reliable way to dodge plagiarism checks.
Prompts can be helpful as reminders or guides for writers when crafting content; however, they won’t be able to prevent plagiarized content from slipping through the cracks.
I mean, the text has to come from somewhere!
You can best use a reliable development tool and conduct thorough quality assurance before submitting content for review.
If you’re really worried about this, you can run your text through a couple of tools before turning in your essay.
That way, you can be sure you’re not accidentally stealing someone else’s work.
Taking a few extra steps will help ensure that your originality stands out – no matter how clever someone’s attempt at automation is.
Other Articles in our GPT-3 Series:
GPT-3 is pretty confusing, to combat this, we have a full-fledged series that will help you understand it at a deeper level.
Those articles can be found here:
- Is GPT-3 Deterministic?
- Stop Sequence GPT-3
- GPT-3 Vocabulary Size
- Is GPT-3 Self-Aware?
- GPT-3 For Text Classification
- GPT-3 vs. Bloom
- GPT-3 For Finance
- Does GPT-3 Have Emotions?
- How is productivity calculated in software testing? [Boost your testing efficiency now] - November 9, 2024
- Comparing Earnings: Computer Science Majors vs. Software Engineers [Find Out Who’s Paid More] - November 8, 2024
- What is the salary range for a Director of Fiserv software development? [Get Insider Salary Insights] - November 8, 2024