Tuesday, June 21, 2011

Puzzles of the Linked Web

In the course of poking around to see who is referring to my blog posts, downloading stuff from my web page, and the like, I've encountered a couple of puzzles, and it occurred to me that some of my readers may know much more about the subject than I do.

The first came when I followed the link at the bottom of my web page to usage statistics provided by my ISP. One of them is "Top 100 of 4635 Total Referrers." Reading down them, I noticed a lot of what looked like pornographic URL's, mixed in with less exotic and more obviously relevant ones:





http://hot-asses.brunettes4u.com/



http://daviddfriedman.blogspot.com/2011/06/murray-rothbard-on-me-and-vice-versa.html



http://www.google.co.in/url



http://www.youngblackpussy.info/black-booty/



http://my-naked-ex-girlfriend.hotnakedgirlfriends.com/



http://www.google.com/m/search



http://nudecelebsfree.info/

Naturally, I clicked on some of them. And ended up looking not at pictures of hot naked girlfriends but at a respectable, innocuous, indeed dull page offering information on online education. I tried clicking on some more and ended up, each time, in the same place. It doesn't work if I click on a URL that looks as though it might actually belong there—only the pornographic ones.

My only guess so far at what is going on is that traffic is being steered to the respectable page by the use of what look like unrespectable URL's, somehow forwarded to the respectable URL when you click on them. But surely there must be a more plausible explanation.

My second puzzle is less exotic. I did a Google search for the title of our POD cookbook and got a surprisingly large number of hits. But when I clicked on one and searched the page for a word from the title, it was not there—there was no reference at all to the book. That was not the case for all of the hits, but it was for a surprisingly large number.

The pattern was not random. The pages I was finding were not pages likely to talk about a medieval & renaissance cookbook. But they were pages likely to link to my blog—largely libertarian or libertarian related pages. In some cases, the date of the page predated the cookbook by several years.

My current guess is that those pages had links to something that had once appeared on my blog. Somehow, when I started posting to the blog about the cookbook, something changed, sending Google a (mistaken) signal that the links were going to those posts. 

Can anyone offer a clearer or more plausible explanation of what I'm seeing? Is there a good webbed discussion of this stuff somewhere?

Labels:

6 Comments:

At 6:13 AM, June 21, 2011, Blogger daryl jensen said...

Looks like some sort of link farm.

http://en.wikipedia.org/wiki/Link_farm

 
At 6:26 AM, June 21, 2011, Anonymous Giles said...

Hmmm, those links now seem to go to the kind of pages you'd expect them to. Lucky we have a fairly relaxed attitude to "SFW" here...

One possibility is that the URLs were initially set up to look respectable but have since been switched over to the site they were aimed at. This could have been deliberate (perhaps they thought you might be checking them and decided that they could semi-hide their true nature while that happened) or it could have been accidental -- when you re-target (say) foo.com from one IP address (essentially, a specific computer) to another, it can take time for the change of address to filter through the network (technically, the DNS propagation takes a noticable time) so perhaps the girlfriend site was originally set up on one IP address, but was then moved somewhere else, while the education site took over the original IP address. You could have accessed the sites at a point before the change of address had filtered through to you.

Possibly relevantly, the porn domains you list are all owned by the same person. edu.com is registered under a different name.

Regarding the Google hits -- there are a lot of spam pages out there that take RSS feeds from reputable sites and then republish them as if they were original content as a kind of steganography to hide the spam links. Perhaps that was happening with the pages you found, and the "borrowed" content that linked to you had simply dropped off the front page?

 
At 10:17 AM, June 21, 2011, Blogger David Friedman said...

Giles:

The problem with your conjecture on the Google hits is that at least some of the pages in question were real, respectable blogs, with coherent discussions of the sorts of things that would be of interest to people interested in my posts.

 
At 11:03 AM, June 21, 2011, Blogger Breno said...

The porn links are refer spam. Someone setup a software to send an http request and spoof the refer to their site.

Some statitics are open and then the refer gets a free link that help google ranks plus site owner clicking on it.
(They now got also a free link on your blog post)

I tried all the porn links in your post and all point to a porn site.

Google hits for me are normal using quotes 238 results.

Maybe some off the blogs are using your rss. Try use the cache version from google.

 
At 12:22 PM, June 21, 2011, Anonymous Laura Lindsay said...

First time I've ever laughed out loud at your blog... Usually some pretty heady stuff. Best line ever-- "Naturally, I clicked on some of them."

Ha ha ha ha!

Thanks so much. I enjoy reading it. :D

 
At 5:58 AM, June 24, 2011, Anonymous Giles said...

David -- interesting. Perhaps the pages that appeared in your referrer logs were time-dependent and the link had just dropped off to the next page, in the same way as the link to the piece by Rothbard will drop off your own front page shortly.

 

Post a Comment

Links to this post:

Create a Link

<< Home