📑 Table of Contents

1. How to Check If Googlebot Has Visited My Website
2. Pages Googlebot Frequently Checks
2.1 robots.txt
2.2 Older Posts and Resources
2.3 Recent Posts

Why didn’t I think of this sooner?

Many people get frustrated wondering, “Why isn’t my new blog post getting indexed?” I was the same. I felt like Google was ignoring my blog entirely, and that I was just fighting an impossibly lonely battle in the vast desert of the web.

Then, while studying, I realized that crawling simply means that Googlebot is sending a request to fetch pages from my site, and those requests are saved on my server as logs. Those logs contain everything—from the timestamp of the request to how my server responded to it.

Alright, let’s go.

How to Check If Googlebot Has Visited My Website

I use Ubuntu as my operating system and Nginx as my web server. Other well-known web servers include Apache and Caddy.

To check Googlebot’s requests to my server, I ran this command in the terminal:

sudo grep -i "Googlebot" /var/log/nginx/access.log

It prints out tons of logs, and since they’re hard to read, I copied everything and asked Gemini to summarize it...

Screenshot A summary of Googlebot access logs generated by Gemini

I honestly thought Googlebot wasn’t visiting my site at all, but it turns out it was coming consistently—constantly—even right before I checked the log. Googlebot was coming and going nonstop.

Pages Googlebot Frequently Checks

robots.txt

The most frequently requested page was robots.txt. This file tells crawlers which pages they shouldn’t collect, so it’s checked very regularly.

Older Posts and Resources

The second most common requests were for previously indexed pages. Many of these pages no longer exist now, so my server returned 404. Since Googlebot already knows these URLs exist, it keeps trying them again from time to time.

Recent Posts

So what about the recent posts whose existence Googlebot doesn’t yet know about? If you check yesterday’s logs with this command,

sudo grep -i "Googlebot" /var/log/nginx/access.log.1

You can see logs like this:

Screenshot Googlebot requesting a sitemap XML file from access.log.1

It requested one of the XML files from my sitemap—the one containing the post list for a specific language.

This means that even though Google doesn’t check the sitemap constantly, it does occasionally use it to discover whether new posts exist.

In reality, recent posts aren't crawled "in order from oldest to newest." Instead, crawling spreads out slowly from whatever link Googlebot *happens* to touch first—completely by chance.

I usually post in six languages, and among my latest posts, Googlebot had only crawled certain languages for certain posts. It’ll probably crawl the rest little by little. Once a post is crawled and analyzed, it’ll get indexed unless there’s an issue— and then it will finally show up in search.

I theoretically *knew* all of this, but actually seeing it with my own eyes made me ridiculously excited. So this is how it works...

Han Dong-sun

Du Dong-sun

Se Dong-sun

The Ultimate Way to Check When Googlebot Crawled Your Website

How to Check If Googlebot Has Visited My Website

Pages Googlebot Frequently Checks

robots.txt

Older Posts and Resources

Recent Posts

More Articles That Continue This Flow