You are the best AI content detector (but you don't know it yet)

17.12.2024

Creating lots of content is a good way to increase traffic to your site. It can also be a one-way ticket to the online abyss if you're not careful.

Since the release of ChatGPT, generating content has become much easier. You simply provide instructions, click "submit" and wait for the magic to happen. But is this a sensible strategy for increasing online visibility? As the Google's March 2024 update., not really.

Thanks in large part to this update, more and more people have begun to look for ways to detect AI-generated content.

Why? Because Google began reducing the visibility of websites that abused AI tools to flood the Web with mediocre articles.

Eventually, many people came to the conclusion that content written by artificial intelligence should be avoided. As a result, AI content detectors have grown in popularity. Unfortunately, these tools are also imperfect.

In this article, I argue that humans are better at identifying AI content than machines - at least in ways that matter for SEO. What does that mean? Learn the answer.

Why do we care about content generated by artificial intelligence?

We take care of it because Google takes care of it. That's basically it. But here's the catch - Google only cares about the quality of content, which is often poor when you create content on autopilot.

I have nothing against content created with the help of artificial intelligence. Big or small - AI's contribution doesn't matter, as long as the result is good.

So what makes up a good result? Here are some tips to follow:

  • Create people-oriented content. When you focus on tracking your results, it's easy to lose perspective. Remember, it's worth creating content that others can benefit from, not just to increase your views.
  • Demonstrate the features of E-E-A-T. E-E-A-T is a group of signals that Google uses to determine the quality of content. The more of the right signals you send, the better your chances of making it to the first page of search results.
  • Talk directly about the use of artificial intelligence. As one of the elements of creating helpful, credible and people-oriented content Google lists content self-assessment in terms of "Who, How and Why." When you publish your content online, clearly identify who authored it, how it was created and why it was written in the first place. So, if you're using artificial intelligence at any stage of the process, you'd better not beat around the bush and admit it.
  • Carefully vet everything you publish. While it may seem trivial, it's good practice to look at content with more than one pair of eyes before publishing it online. The temptation to skip a few steps and publish your article as soon as possible will always be there, but think about it, is it really worth it?
  • Familiarize yourself with the guidelines for quality assessment. Google in its guidelines for evaluating search quality shares several insights. Frankly, every person who wants to rank high on Google should know them.

AI content does not equal bad content

Based on our current knowledge and what Google says, there is no reason to believe that it is penalizing websites with artificial intelligence-generated content.

It's just that a lot of low-quality content is created by large language models (LLMs). And frankly, this is not surprising. They are designed to meet a wide range of potential uses, and LLMs are trained to provide generic answers.

This is the reason why almost every health article you generate with ChatGPT will contain at least one sentence saying something like "For more information, consult your doctor."

It's not a bug. It's a function.

What we can do as copywriters is to find reliable sources and present them to the language model as examples. Or, better yet, write some content ourselves and heavily edit what the machine provides, using it only as a helpful outline.

There is also the issue of writing concise, simple prompts that can push the language model in the right direction. Creating such prompts is considered by many to be an art form.

Although not easy, doing this correctly can significantly improve the quality of results obtained from language models.

So we are not completely helpless when it comes to making sure the content is in line with our expectations.

Given this context, saying that all AI-generated content is bad content is an oversimplified description of the problem.

We don't need to eliminate AI content. We need to eliminate bad AI content.

Should we bother with AI detection?

In most cases, determining whether we are dealing with artificial intelligence-generated content is a wise move. After all, the more information we have, the better choices we can make - at least in theory.

But before we can eliminate bad AI content, we must first identify it. And this is an area where humans outweigh AI content detectors.

Yes, an AI content detection tool is likely to publish better results when it comes to identifying whether we are dealing with AI or human-written content.

However, it will not be able to distinguish quality content from bad content.

The Internet is full of posts in which people ask about reliable AI content detectors like this one:

Source:https://www.reddit.com/r/ChatGPT/comments/13yo3xr/which_ai_content_detector_is_reliable/

Usually there are many answers published by other users, each recommending a different tool.

There are also comments in which users mention that no tool is foolproof.

In addition to providing anecdotal evidence, they sometimes go further and provide screenshots in which text undoubtedly written by a human is labeled as generated by artificial intelligence, or vice versa.

Source: https://www.reddit.com/r/ChatGPT/comments/13yo3xr/comment/l09888o/

But is this discussion in any way relevant? Why are we so obsessed with finding the best AI content detection tool?

I think this has to do with our tendency to look for the simplest solutions to complex problems.

Many site owners see that sites with low-quality AI-generated content perform poorly in search and come to similar conclusions. Since it takes time and effort to determine whether AI-generated content is good or not, why not just ban it?

This approach may seem right at first glance, but I consider it another case of throwing the baby out with the bathwater.

Just as mindlessly publishing AI-generated content is irresponsible, completely banning all AI content is just as bad.

I won't go into detail and list all the advantages of automation in the content creation process. Instead, let's go back to the claim I made at the beginning of this article.

Are humans better at detecting bad AI content?

Most people are not familiar enough with AI-generated content to identify it correctly. They can become better at it - provided they have some examples to use as training data.

This is even more true when it comes to detecting bad AI content.

Since AI detectors only evaluate whether a text has been written by an artificial intelligence, we cannot expect them to provide us with answers regarding text quality.

That's why I suggest treating AI detectors as useful tools, not obstacles that authors must overcome to get their articles published on the site.

Let's face it - people are not perfect.

According to worldmetrics.org human error:

  • causes about 90% of all data breaches;
  • is responsible for about 80% of workplace accidents;
  • causes up to 400,000 deaths in the healthcare sector each year in the US.

These statistics do not paint a pretty picture.

The situation becomes even worse when looking at the description of the of the course posted on the official website of the University of Texas at Austin. We learn from it that:

"people make at least three mistakes (usually 5-7) per hour when they are awake, and the number increases to 11-15 per hour when they are extremely stressed or tired."

As humans, we make many mistakes. However, do we mistakenly mark AI-generated texts as written by a human? And if so, how often does this happen?

What does the research say on the subject?

A study conducted in 2021 by University of Washington showed that evaluators without training distinguished between GPT-3 text and human-written text at a random chance level (they were about 50% correct).

At the same time, their confidence in giving the correct answer remained quite high (around 50%) in all tasks.

In short, the evaluators underestimated the capabilities of modern language models and overestimated their ability to distinguish GPT-3 from texts written by humans.

Here are some of the explanations they gave to prove their point:

As you can see, there are two logical interpretations of the same text, which lead to very different conclusions. Only after training on a number of examples were the evaluators able to significantly improve their results.

Does this mean that the average person has no chance of discovering the truth? No, but guessing what is and is not generated by artificial intelligence without training is just that - guessing. It's more akin to flipping a coin than performing any technical evaluation or analysis.

What can we do to improve these numbers?

First of all, get acquainted with content generated by artificial intelligence.

Many of us are already doing this - albeit unintentionally.

As I mentioned at the beginning of this article, the Internet is filling up with AI-generated content at an alarming rate. Google - quite reasonably - decided to take action against this, and now everyone is suddenly avoiding AI-generated texts like the plague.

But Google is not perfect, and bad AI-generated content sometimes manages to climb to the top of search results. These are the moments when we have a chance to confront them and learn from them. An example is BNN Breaking.

BNN Breaking was a news service that quickly gained popularity. In its first two years of existence, it managed to build quite a reputation. Well-known websites such as The Washington Post and The Guardian linked to it. It was also promoted by Google News and MSN, a web portal owned by Microsoft.

The site was removed after an incident in which a photo of well-known Irish talk-show host Dave Fanning was mistakenly added by artificial intelligence to an article describing a case of sexual abuse.

After a thorough investigation, it turned out that the site was filled with AI-generated content. The New York Times detailed the case in this article.

Why do I mention this? Because this is a perfect example of using artificial intelligence in an irresponsible way that nevertheless worked and brought 10 million visitors a month.

If we put just a little more effort into checking the data and intensively editing the results generated by artificial intelligence, we can use the same tools in a much better and reliable way.

Familiarizing yourself with AI content

You don't have to rely on random chance to find examples of articles written by artificial intelligence. You can use ChatGPT for free and generate lots of texts in just a few minutes. Then, all you have to do is read through these articles, looking for recurring patterns.

Remember the study conducted by researchers at the University of Washington? Training with examples improved the overall accuracy of identifying typed and human-written texts from 50 to 55%.

The researchers note that "the significant difference in overall performance is mainly due to the storytelling domain" and has to do with eliminating the bias that machines cannot generate "creative" text.

Nonetheless, becoming familiar with texts generated by artificial intelligence has its advantages.

Here are some warning signs that you are dealing with machine-generated content:

  • The paragraphs are long and very similar in length;
  • The structure of the article remains similar, regardless of the topic;
  • There are many complex, technical terms in the text;
  • The text is very predictable, i.e., it has low complexity.

Of course, this list is not exhaustive, but it should give you an idea of what to look for.

Fixing poor content created by AI

Ok, so you've identified poor content created by artificial intelligence. How can you fix it?

There are many ways to improve these texts. The whole process is similar to what can be done when editing poor human-written content.

What factors are worth paying special attention to?

  • AI-generated titles can be uninventive. Change the title to give your article more personality.
  • As mentioned above, artificial intelligence tends to stick to one structure whenever possible. To alleviate this problem, give the language model its own structure or heavily edit the text, adding, changing and removing headings as needed.
  • Some AI content detectors have a problem with certain words or phrases that they consider signs of AI use. If your article will be checked by these tools and you want to avoid detection, paraphrasing tools can be useful. They will replace problematic words with synonyms without changing the meaning.
  • Check the data and conduct your own research on the topic. If you find reliable sources, cite them in your article to improve its quality.
  • Refrain from blindly accepting everything the language model gives you. Mix and match text at will to create something really interesting. Add some tables and lists to make the article easier to read.
  • Give some real-life examples. Artificial intelligence doesn't have real-life experiences it can describe, but you do! Use this to your advantage and earn extra points by demonstrating one of the elements of E-E-A-T.
  • Remove repetitive sections and replace them with something new. Write new ones, or ask the language model to describe a specific point that fits well with the overall theme of the article.

Using artificial intelligence in a responsible way

If I were to condense the message I want to convey in this article into a single thought, it would be as follows:

There is nothing wrong with using artificial intelligence to improve the content generation process. However, we need to do it responsibly.

This includes both using a language model to generate text and using an AI content detector to determine whether it is worth publishing.

In both cases, we have the chance to work with tools that can do our work for us. However, I don't think that handing over our task to artificial intelligence is a good solution.

AI tools are not so advanced that we should leave them unattended or treat what they give us as 100% real. We still have to put in some effort ourselves.

Even editing text generated by artificial intelligence is a step in the right direction.

The folks at Originality.ai still consider such texts to be AI-generated, but I disagree with that assessment.

Source: https://originality.ai/blog/ai-content-detection-accuracy

As in many other areas of SEO, whether or not we can still consider text to be AI-generated depends on various factors.

First of all, on how many changes we made to the article.

At what point or after how many changes can we claim that an article is no longer generated by artificial intelligence, but by a human? It's hard to say.

Some may argue that after a few improvements we are no longer dealing with editing, but with rewriting an article. But again - the line between the two is blurred.

When can we safely assume that what we are doing is rewriting an article, not editing it? After we have changed 50% of the article? Less? More?

Who can decide and resolve this issue?

As you can see, there are more questions here than answers.

Concluding thoughts

The bottom line is that AI content detectors - like all AI tools - cannot be treated as infallible.

While humans make mistakes, using artificial intelligence is not a magic shortcut to help us avoid mistakes.

In the words of JC Denton, the protagonist of Deus Ex (a video game released in 2000):

"Human beings may not be perfect, but a computer program with language synthesis is not the answer to the world's problems."

I feel the same is true of all the artificial intelligence tools that are so popular these days. They are not the answer to our problems, but they can be a significant help in achieving the desired result.

We just need to be smart about how we use them and for what purpose.