collusioni.st

Google Keywords

Monday, October 20 2003

I’m usually always

amazed at where I show up in Google rankings for certain keywords. More often then not, people searching for extremely bizarre things end up here, and I get a real kick at looking at my server’s log files for Google hits.


A while back, I figured out a way to isolate just the Google hits into a log file, and when I recently rebuilt my webserver, completely lost that trick. I will document it here so I never forget again.


First off, I only know how to do this in Apache 1.0 and up. If you’re using IIS, you might be out of luck (all the more reason to switch to Apache).


In your httpd.conf file, add the following somewhere:


LogFormat “%{Referer}i -> %U” google_searches


Keen eyes might notice that this is the exact same format for tracking referers. I’ve split the two in case I ever want to format the log differently in the future.


Within a Virtualhost, add the following (it might be different for you, but I host multiple domains on a single IP, so VirtualHost is the way to go for me):


SetEnvIfNoCase Referer google google_ref
CustomLog “|/usr/local/sbin/cronolog /logs/michaelbuffington/google/%Y%m%d.log” google_search env=google_ref


The SetEnvIfNoCase directive, in plain English says “If, within the Referer field you see the word google, throw everything you know about that particular hit into a variable called google_ref.” Keep in mind, the word “google” in this case is a regular expression (a very simple one). It will match any bit of text with the word google in it, include bogoogle (something Australians do), or pugoogleck (an Eskimo word for “punch in the eye”). I might accidently track incoming users that aren’t coming from Google. That’s interesting to me too, so I don’t mind so much.


Then, I get tricky. The CustomLog directive commands Apache to normally just write a log file to a particular location in a particular format (the format being the one I specified in LogFormat earlier, google_search). But in my case, I’m doing a bit more here. I use a program called cronolog that splits my log files up into seperate files. I’ve set it up to split them up by day. This works well for me, but it can be totally changed depending on your needs.


I do almost the same thing to my Refers log file as well, except I simply exclude blank referers, as well as referers coming to from my own site. I’m am, if you haven’t gathered, fond of outside attention, so I have new referers streamed to my wristwatch using satellites, lasers, and nanotechnology.


I hope that you too will be able to take advantage of this, but don’t give me credit, that belongs to Apache. You may, however give me cash.


Update:
I created a page that shows the live Google stats. Hopefully this serves to produce even stranger Google searches, as Google does weird ranking stuff if you list your Google stats I’ve heard.


Also, I changed the pattern of google to  ([.])google([.]) which looks really funny if you don’t get the concept of regular expressions. It basically says "match any bit of text that has one or more full stop (the [.]+ part) followed by the word google, followed by one or more full stop again. I probably don’t need parens around [.]+, but then again, I don’t really need to wear socks, they’re just more comfortable. With this fancy new pattern, I should get a lot more actual Google matches. I fixed this because the Live Google Stats page was showing up in the stats, which, as you might appreciate, is a bad thing.