Saturday, January 31, 2009

The danger of algorithms

Sometimes it's very easy to make algorithms that have very odd side-effect. Especially when doing data analysis and classification. I remember when I used to work on classification the amount of crazy things that my algorithm did, like calling a "3 Person Extra-Comfortable Couch - Mango" a Mango (the fruit). So this picture posted by newobj was great:



First time I tried it was already fixed, so probably didn't last very long. But it's easy to see why Google could classify itself as a harmful site: it dynamically links to all sorts of different places, including harmful sites. It's reasonable to believe that if a site has a very low proportion of harmful things, but still has some it might be classified as harmful. So, if they don't remember to filter out content-less search engines they will always end up saying that Google or any other search engine, is dangerous to you.

Maybe the solution is to just stop using them and only work on controlled sources like Wikipedia or Mahalo or Twine or any of the other million of sites and products out there that try to provide you with the ability to discover things in a more human-controlled fashion.

I do believe that that's part of the future of the web (and you can claim that since the early stages of Yahoo it has been the past of the web too): a sea of information with personalized filters. The personalized filter is aware of your social network (i.e., it trusts information approved by your friends more than the information approved by complete strangers) and your current state (I'm on my cellphone in a city I've never been before searching for "gas station" should be very straight-forward what to return). Little by little hardware and software are converging there.

And there I went in a complete tangent from my original post. Probably I just don't want to do what I need to do: work. But today is my only real chance. Tomorrow I have a superbowl party which will consume most of the day.

0 comments: