Monday, January 20, 2014

Wanted: A Better Way to Egosearch

One of the things I like about the Internet is the ability to spot people talking about me and, if necessary, respond. In the old days of Usenet, I could do it using the DejaNews search engine. I used to describe the situation as the winds of the world blowing any mention of me to my ears and blowing my response back to the ears of everyone who heard the mention—in the form of a post by me on the same thread of the same newsgroup.

Unfortunately, the DejaNews archive was taken over by Google, which proceeded to make its Usenet search engine year by year less workable. At this point, so far as I can tell, the Google Groups engine is almost entirely useless for searching Usenet. That would be a serious problem if Usenet still contained the bulk of the relevant conversations, but it doesn't. What I most want to search now is the web.

Google Internet search engine lets me do that. I can filter out most references to other David Friedmans—it is, unfortunately, a pretty common name—by including in my search string an appropriate collection of ors (Economist OR Libertarian OR Anarchist OR ...) and nots (-Basketball -Concerned -Ironic - ...) . But I am left with hits most of which are to pages with links to my blog, my web page, or YouTube videos of talks I have given. While I am happy to know that people are linking to my material, none of those is a conversation or requires a response.

What I want is some way of doing a search that will ignore my name in links and report only pages where someone is actually saying something about me. Any suggestions from those more expert in the relevant technology than I am?

(And yes, for any Usenet veterans out there, I know that the proper term is kibozing.)

7 comments:

Joey said...

In case you didn't know, you can "-" actual web pages as well:

economist OR libertarian "david friedman" -daviddfriedman.blogspot.com

And go to "search tools" -> "time" -> "past week" so you'll only see recently updated pages.

Additionally, you can attach a signature (PGP if you worry about imposters) to also "-" from your searches like a breadcrumb trail.

Joey said...

Additionally, you can attach a signature (PGP if you worry about imposters) to also "-" from your searches like a breadcrumb trail.

By that, I mean attach a signature to new comments that you write on other websites so you can -signature or "signature" as needed.

You can use a timestamp as the plaintext in the sig so you can search for responses to your comments, etc.

I'm sure there are more sophisticated ways of solving these problems though...

Anonymous said...

Negate entire domains with
"-site:daviddfriedman.blogspot.com
-site:daviddfriedman.com"

David Friedman said...

I normally search for only the previous day's posts, unless I have for some reason been away from the net for a while.

I don't want to negate web pages that have a link to my blog or web site, because they are web pages that might also mention me. I want to negate web pages where the only reference to me is the link.

Daublin said...

Have you tried Google Blog Search? It seems to sort by time of post.

If you use a blog reader such as Liferea, you can search within the blogs you read for anything with a given term.

Google News can make an RSS feed for you with any search term you like, but unfortunately it seems to limit itself to traditional news sites.

Daublin said...

Oh, and if you use something like Google Plus or Facebook, then every time you are mentioned the software automatically notifies you.

Tibor said...

Well, as long as what is posted is an actual URL adress and not a hyperlink text, you can fix it by searching for the exact phrase (as well as the other parameters). If your name is only mentioned in the URL, then the result will be ignored, since the space character between your names cannot be a part of a URL. An even better search would be
"david friedman" OR " friedman " OR " friedman" OR "friedman "

as some commenters may only include your last name. I'm not sure if the google search does not ignore the spaces in that follow immediately after the hyphens (or only other spaces) or are followed by ones. I'm sure it does not ignore it if it is in between other characters.

If they use a hyperlink like
David Friedman

then it gets a bit more complicated, but you could still write a script that would first do the regular search, then download all the source codes of the web pages in the results, then go through each source code and search for the string "Friedman" or "David" while ignoring every result that is preceded by ">" and followed by "<" and print the search results that were not filtered that way.

It should be quite simple to write a code like that. I'm not sure it is worth the trouble though. I guess it depends on how big a proportion the only-hyperlink results are so how much time it takes you to go through them to find out they include nothing to reply to. Also, if your name is only mentioned in a hyperlink and nowhere else, but then followed by a commentary of the poster, it will also be filtered out. I guess it would be a bit harder to fix that.