How to check which URLs have been indexed without upsetting Google: A follow-up

Posted by

google-questions1-ss-1920

Back in October 2016, I wrote about how you can use a Python script to determine whether a page has been indexed by Google in the SERPs. As it turns out, Google’s webmaster trends analyst Gary Illyes wasn’t too happy with the technique that was being utilized by the script, so I cannot endorse this method:

I’ll just leave this here: https://t.co/NO4s6JbSfJhttps://t.co/qRhIGXcG7g

— Gary Illyes ᕕ( ᐛ )ᕗ (@methode) October 5, 2016

Shortly after, Sean Malseed and his team at Greenlane SEO built a similar tool based in Google Sheets (among other awesome tools like InfiniteSuggest), and Googler John Mueller expressed reservations:

@greenlaneseo Is this a blackhat tool or does it abide by the webmaster guidelines & robots.txt? (just curious)

— John ☆.o(≧▽≦)o.☆ (@JohnMu) December 14, 2016

How could I learn which pages weren’t indexed by Google, and do it in a way that didn’t break Google’s rules? Google doesn’t indicate whether a page has been indexed in Google Search Console, won’t let us scrape search results to get the answer and isn’t keen on indirectly getting the answer from an undocumented API. (That was Sean Malseed’s clever solution and scraping workaround.) Let’s explore some solutions.

[Read the full article on Search Engine Land.]


Some opinions expressed in this article may be those of a guest author and not necessarily Marketing Land. Staff authors are listed here.


About The Author

Paul Shapiro is Director of Strategy and Innovation for Catalyst in Boston. Paul loves to get down and dirty with innovative SEO strategies. He also enjoys watching old horror movies, programming, collecting ancient artifacts, and writing about SEO on his blog, Search Wilderness.


 

Leave a Reply

Your email address will not be published. Required fields are marked *