Monday, April 19, 2010

One graph to conquer it all

I received a link from a friend of mine to a very interesting challenge:

Graph Theory II: CONTEST for Geekgold

It's an interesting contest in which people decomposed a board game into a graph and then you need to identify the board game looking at the graph.

This reminded me of an old discussion I had at work. But before I get there I have to remind people that my Ph.D. research was on graph-structured databases, so I have used a lot of graphs in my life. I don't even know how many different graph "frameworks" I have implemented, including a very restricted but highly efficient graph database. I like graphs, but I also learned their limitations.

Back to my story: my old manager once had a vision of how to solve every problem. He thought that if we could build a giant graph that recorded everything and how everything related to everything you could solve all problems. And we had many meeting discussing this vision, which was never implemented, because nobody really believed in it beyond that manager.

This vision had two different problems:

The first one, which is the easiest to explain, is scale. It's very hard to build something that has an arbitrary level of connectivity and allows for queries that could be of any length. If you add to this the need to build something on a large fleet of small and "unreliable" hardware, so requiring redundancy and robustness to failure, you basically would have a very hard time to keep it to any reasonable level of performance. That I have personal experience with, as I did implement a graph structured database that, in order to achieve any meaningful performance characteristics it required (1) to pre-calculate all the "joins" and (2) make them read-only.

The second one is harder without explicit examples, but it's related to the curse of dimensionality: if you add too much to your graph, soon you can't conclude anything from it, because little noise in many dimensions will overwhelm all your signal. Just saying that in the air is hard to convince anybody. New techniques to deal with large datasets with large number of dimensions are more and more successful at identifying "low-hanging fruit" at scale, i.e. whatever has a very large signal compared to the rest of the noise. If you have a lot of data, those algorithms have been able to scale so well, that it's possible to apply them in all the data and find a lot of different "high signal" patterns. It's not that the filters are getting better, it's just that we have been doing a better job at looking at a lot of data at once.

In any way, going back to the board game graph puzzle, it's an interesting challenge. I don't know enough board games to be able to recognize almost any of them, but I had fun just trying to decipher the graphs and relate them to a possible game. Enjoy!
blog comments powered by Disqus