Tuesday, May 10, 2011

Book Review: Programming Collective Intelligence

I've been working my way through the book Programming Collective Intelligence; Building Smart Web 2.0 Applications by Toby Segaran. A very interesting read and I like the way the book tells the story of building collective intelligence into your applications. I was particularly impressed with the sample python algorithms and how grounded in theories and mathematical formula they were. Two that were used early in the book and referenced often were the Euclidean distance and Pearson correlation formula. Seeing both of these formula in code and put into the context of web 2.0 and social correlation was helpful in deepening my understanding of the math behind measuring collective intelligence.

I hesitated when I started reading the chapter describing search and how to build a search engine, I felt it was included just as filler for the book (a few chapters to add 100 pages or so). And I figured their are already too many books on how to build search engines. I am glad I pushed through, for once I started reading I began to see the point of describing and building a search engine. Descriptions (with source code) of how collective intelligence algorithms can be integrated into search were introduced. This is a huge and important topic when you think about social search and the use of the social graph is contributing and impact collective intelligence could have on search.

The next few chapters get into programming many aspects of collective intelligence using API's, existing data services and open source to get you to completed more quickly. The sections and code of most interest to me are within the social graph, making intelligent decisions for and about groups of people, utilizing data sets, decision trees and other intelligent services via programmable API's. Particularly important in these programming focused chapters were the examples and end of chapter exercises, working through these exercises provide the hands on experience to deepen understanding.

As I got into the later chapters I began to realize just how comprehensive the coverage of programming collective intelligence the book provides. I was particularly interested in the support vector machine and genetic programming. Even, though I do not commonly find myself programming in any depth I found the descriptions of these rather heady subjects well described and easy to understand. I also appreciated the closing chapters of the book where a review of the different algorithms is provided, how they are best utilized and their strengths and weaknesses. References to all the third party libraries was put into a single appendix as were all the mathematical formula.

Going through this book cover to cover, reading the examples and diving into a few (or all) of the exercises gives the reader a very good understanding of how much data and intelligent services are already available via API's on the Internet. Programming against these services and creating new services is a big part of the Internet as a research, knowledge and learning environment. Anyone interested in developing these new services, consuming these existing ones or interested in the mathematics behind collective intelligence, I strongly recommend this book.