Google is one of the best hacking tools out there. It may sound incredible, but relatively simple searches in Google and other search engines can dig out sensitive or even dangerous information about your site, your servers and your company.
You want Google to index your site and make you visible and searchable. That part is good for you. But if you have been careless, Google can also index more sensitive information that was never meant to be public, and can therefore be a useful tool for hackers if they want to probe your site for vulnerabilities. This is often called Google hacking. (To be fair, other search engines can be used as well, so it could just as well be called search engine hacking.)
There is a lot of information that can be found on Google thanks to careless (or clueless) administration of websites:
- Various usernames and passwords (both encrypted and in plain text)
- Internal documents
- Internal site statistics
- Intranet access
- Database access
- Mail server access
- And much, much more
Needless to say, a lot of this information can be used as a starting point for breaking into your systems.
Some examples
There are a huge number of different search strings for sensitive information that have been published around the web. A collection exists at the Google Hacking Database (it currently has 1,423 entries), which is where we found the two following examples.
The image below shows the result of two different searches, one for SQL insert statements that use encryption functions for passwords, and another one for INC files with PHP code in them that contain unencrypted user names, passwords and addresses to the corresponding databases.
Both these searches show a number of results where passwords and usernames have been indexed and cached by Google.
Not all of the results will be relevant, but this is just the tip of the iceberg.
How common is it?
We searched for “google hack” in Google Trends, and as you can see in the graph, that search term is becoming increasingly popular.
This isn’t all that scientific, but it could give a hint that this kind of information gathering is increasing.
On a small side note, it is also interesting to see that a lot of the searches seem to be coming from Asia, and especially South-East Asia. The top countries for this search term are Indonesia, Vietnam, Malaysia and the Philippines.
Is Google doing anything to prevent this?
Google is not standing idly by. The company seems to proactively try to block some usage of these “hacker searches”, at least via direct search links (which can easily be used by automated applications). Some of the queries we tried gave us this result:
What can you do to prevent “Google hacks” on your site?
There is no way we can list every single security measure out there, but we hope you will find this to be a useful starting point.
- First, if at all possible, keep all your sensitive information off the internet. I.e. on storage that isn’t even connected to the internet.
- Be careful how you write your scripts and access your databases. There are numerous examples where a database access error text shows up on a website and contains way too much information. If you are unlucky and have Google crawl your site at that time, the information is public (and cached by Google).
- Use robots.txt to let Google know what parts of your website it is ok to index. However, note that even this information can be used by hackers, if you for example specify which parts of the website are “off bounds”, the curious will of course look there to see what is so sensitive if they are targeting your site specifically. And of course, don’t forget that if someone wants to scan your website themselves, without the help of Google, they won’t care about robots.txt and other things that prevent nicely behaved online search robots like Googlebot.
- Make sure that the directory rights on your web server are in order, i.e. only allow public access to the bare minimum of directories that are necessary for your site to function. This is a precaution that isn’t really specific for Google hacking, but is worth mentioning.
- Monitor your site for common errors. You can set up monitoring (for example with Pingdom, or another website monitoring service) that checks for text that should not exist on the page, for example part of a php script error message. Then you will know right away if and when your site has “messed up”, and can take the necessary precautions (changing passwords or whatever else may be suitable).
- “Google hack” your own website. Try out the various searches listed in the Google Hacking Database on your own site.
Find out more
- Old but good article explaining search engine hacking.
- The Google Hacking Database has more Google search queries for sensitive information than you could ever imagine.
- The Google Hack Honeypot, which tries to find new searches that are being used.
- More about robots.txt. The always-useful Wikipedia has a good entry about robots.txt as well.
(Google really does have more search tricks than you can throw a stick at.)