- 1. How Search Engines Work
- 1.1 Search engines use crawlers (a type of bot) to scrape content from webpages across the network
- 1.2 An indexer turns the collected webpages into tables and stores them in a database
- 1.3 When a user searches on Google, a search algorithm displays relevant webpages as search results
- 2. Clues to High-Quality Content: Google's Quality Rater Guidelines
- 2.1 Page Quality (PQ) Evaluation
- 2.2 Needs Met (NM) Evaluation
- 2.3 Conclusion: Search Engines "Strive" to Mimic Humans
Sometimes human intuition doesn't align with the logic of a meticulous algorithm. For example, it's not yet difficult to distinguish between a human-made and an AI-generated drawing, or between a human-written and an AI-written article. We instinctively feel that human-written text has "more sincerity." No matter how long or well-organized an article is, AI-written text often feels soulless. That's a human judgment.
Recently, with the emergence of VEO3, the production of AI-generated videos has also become active. Sometimes on Instagram Reels, you see "accident videos" that make you wonder how such a thing could happen, only to find comments saying, "This is a fake video made with AI." The moment of shock quickly fades. Nothing has changed physically, but our perception of whether it's AI or not changes the power and status of the content.
Today, AI-generated works are prevalent on YouTube and in Google's search results. So, is it "unreasonable" for AI-generated content to get picked up by the algorithm? Today, I'd like to think about this topic.
1. How Search Engines Work
Search engines use crawlers to scrape content from webpages across the network
A crawler is a type of bot. There are various crawling methods. For example, if I create a website, its address is necessarily registered with the DNS (Domain Name Server) unless it only works locally on my computer. A Google crawler might then collect my website's address through the DNS, or it might follow a link to my website that someone else has placed. If it has a history of collecting my website, it might come back based on that record, or it might collect it through a sitemap I submitted. In short, it "mobilizes all sorts of methods" to collect web documents that are accessible on the network.
The robot.txt file serves to guide what to collect and what not to collect. The website owner writes this file and places it in the root folder of the website.
An indexer turns the collected webpages into tables and stores them in a database
When a user searches on Google, a search algorithm displays relevant webpages as search results
The actual process of a search algorithm can be seen in three main steps. First, it analyzes the user's search query through morphological analysis and tokenization to understand its core meaning. Next, it quickly searches the index database that Google holds for candidate documents related to that keyword. Finally, it applies numerous ranking algorithms to sort the search results, and even includes personalization factors like the user's location, device, and search history to show the final result. This is why searching on your device can yield different results from searching with a VPN on from another country's IP. The pages that appear at the top of the finalized search results are not simply those with the most keywords, but pages that have been comprehensively evaluated for reliability, quality, and user engagement.
2. Clues to High-Quality Content: Google's Quality Rater Guidelines
"I've seen articles with bad content that still rank high. The idea that quality matters is a lie."
While we can't know the algorithm, there is a document that gives us a glimpse into what Google "pursues" in its search engine: the Google Quality Rater Guidelines. The content of this document is roughly as follows.
Page Quality (PQ) Evaluation
Search quality raters (humans) evaluate how well a page achieves its purpose. It's impossible for them to evaluate every single webpage, so they work with a selected sample. Ratings range from Lowest to Highest.
- Highest Rating: Given when a page has a beneficial purpose and achieves that purpose exceptionally well.
- Lowest Rating: Given to pages with a harmful, deceptive, or malicious purpose that could harm people or society.
- Medium Rating: Can be given when a page achieves a beneficial purpose but falls short of the highest rating, or has a mix of good and low-quality characteristics.
- Low Rating: Given when a page has a beneficial purpose but is seriously lacking in important aspects.
Raters consider the **E-E-A-T (Experience, Expertise, Authoritativeness, Trustworthiness)** criteria to judge the quality of a page.
- Experience: Does the creator have direct experience with the topic?
- Expertise: Is the creator an expert in the field?
- Authoritativeness: How authoritative are the creator, the content, and the website?
- Trustworthiness: Is the content accurate, honest, safe, and reliable?
Especially for YMYL (Your Money or Your Life) topics that can significantly impact a person's life, such as health and finance, very high-quality standards are applied. The rumor you sometimes hear that articles on health knowledge or stocks don't rank well in search results is based on this very section of the guidelines.
Needs Met (NM) Evaluation
This evaluation focuses on how well a search result satisfies the user's need.
- Understanding User Intent: Raters determine user intent based on the query and user location (e.g., when searching for "coffee shops in London").
- Ratings: Ratings range from 'Fully Meets,' 'Highly Meets,' 'Moderately Meets,' and 'Slightly Meets,' to 'Fails to Meet.'
- Example: If the query is "batman" and the user is in the US, a search result for the city of 'Batman' in Turkey would not fulfill the user's intent and would receive a 'Fails to Meet' rating.
Conclusion: Search engines "strive" to mimic humans
In conclusion, Google's search engine goes beyond simple keyword matching and tries to mimic a human's criteria for good content. Through countless algorithm updates and evaluation processes, in today's digital environment where it's hard to even tell if an article was written by a human or an AI, we can see an effort to provide people with the most useful and trustworthy information. The "Google Quality Rater Guidelines" are not an algorithm itself, but rather a document that shows how a group of people called raters are working to improve the search engine. So, if someone asks, "How do you explain low-quality articles ranking high?", the only answer is that the algorithm is not yet perfect. The algorithm aggregates numerous 'Ranking Signals,' and rather than directly 'understanding' the quality of the content, it 'infers' quality through indirect signals.
For example, if an article receives many backlinks from other trustworthy websites, or if people click on the page and spend a long time on it, the algorithm might mistakenly assume that the article is "useful." In other words, the algorithm only draws conclusions based on a large amount of indirect data, and it still has limitations in making qualitative judgments like a human. "What is high quality?" is a purely human expression and actually has nothing to do with the algorithm itself.
Going forward, I plan to experiment with all the concepts related to SEO and record what changes occur in the blog I operate (a different one from this one).