This month I decided to aggregate some statistics about spam detection on all my mailing systems (i.e. how many false positives/negatives I get on Yahoo, Gmail and Hotmail). So I'm building this spreadsheet with how many spam messages I get a day on each of those services. With this amount of visibility, it's interesting to see some odd patterns. Like, for example, today I received so far roughly double the average spam I receive a day (which is around 17 messages - today I'm at 33 and the day is not over yet).
What have I found so far besides that? Well, I can't really tell. I receive way too much spam on my gmail account and very little on my other accounts. So I don't have any statistically significant numbers on the other accounts. Current numbers are:
Gmail:
True positives: 159
False positives: 1
False negatives: 0
Hotmail:
True positives: 3
False positives: 0
False negatives: 1
Yahoo:
True positives: 25
False positives: 0
False negatives: 1
If I was not statistically inclined, I would say that, as I was expecting, gmail seems to err on the side of classifying things are spam when they are not spam, which was the biggest problems on early spam detection systems (which had a much higher false positive number than the 0.6% that I'm seeing here). Hotmail and Yahoo learned with this bad experience and decided to miss more spam messages in order for people to be able to use the "empty" button to get rid of all their spam messages without having to go through each of them.
But, as I am "statistically inclined", I'm not going to conclude anything yet and just wait until I have more evidence.
Oh, yes, and this is my 400th post. Interesting number. This blog has been alive for 4 years and 3 months, approximately. My previous blog that lived for 3 years and 8 months had 1,049 posts. Just a small difference, huh?
Thursday, January 08, 2009
Subscribe to:
Post Comments (Atom)
0 comments:
Post a Comment