Black Magic Code

Tuesday, May 23, 2006

But I know it when I see it...

A swedish television investigative journalism "show" called "Insider" is currently running a investigation about the money behind porn industry. Now why I use the term "show" for the television program is that it is not totally unbiased journalism. The first episode of this investigation was little about the background on the history and starting to follow the money. One of the points was that Internet Service Providers(ISP) makes a bundle out of this. I could almost buy their point if it wasn't for their "expert": Per Hellquist. I'll translate a quote for you: "It is possible to screen pictures for skin colored pixels, and from there determine if a picture is pornographic or not!". This in the context that it is feasible for ISP:s to intercept porn traffic by this method. This is so wrong at so many levels.

To start of Per works at Symantec in sweden. Nothing wrong with working at symantec ( or is there? ), but it begs the question how he got his job in the first place? He lacks some fundamental knowledge about how the internet works and lets not forget some common sense.

Lets try to break down what is wrong with the statement.

1. When you transfer a file on the internet, the file is divided in to chunks called packets and routed through the network!

This means that to analyze a picture all packets of that picture must be captured ( also rerequest a packet if it was malformed during transport ) and then assembled to the file, first after that you can start your analysis of the picture. But if you capture your packets during transit then it is too late to stop the transfer! To solve these problems we must somehow proxy the transfer. If by some luck we succeseed in capturing and analyzing the content of the image the next problem rears it's ugly head.

2. Picture analysis is not cheap!

It takes enourmous amounts of memory and processing power to keep up with just the http traffic of a connection. But these companies are usually rich and they could actually take a slightly higher fee for a slower connection just to filter it for the customers. Which customer would not like to pay more and get less content???

3. Picture analysis is hard!

Really, thats why it is so computationally expensive. First of all what is skin color, the shade of colors differs slightly with race. That makes it a little harder but lets assume that we can solve this problem. How would the computer determine image comes from a lingerie webshop or porn page? Number of pixels that are skin colored? The problem here is actually very well known. A human can actually tell what is smut or not, but it is a matter of personal taste. A computer has a very hard time to "see the difference".

Wednesday, May 03, 2006

How about C...

Recently I started studying the deep stuff in C++ with things like meta template programming(MTP) and so on. But also realized the problem with the cool stuff. In my previous post I described programmers that are not proficient enough with C++ and why I'd like to see them move on to the managed land. You should not use MTP in environments where people are not ready to learn it. Because you may not be around until End of Life of that particular code to maintain it. I only know a few people that I would dare get near such code unless it was really carefully crafted.

Maybe we should go back to C instead. Don't get me wrong, C++ is extremly powerful and versatile programming language but maybe it has too much power. C has a less features but it has plenty of power. Less features means less worry about something biting your fellow programmers in the *ss.

Maybe someone says: "This is 2006 and C is very old!"... Yes it is 2006 and C is old, but what is wrong with that? I'll tell you whats wrong with that! Nothing... There is no expiry date on programming languages. It is fairly portable and has a stable standard and has plenty of power.