Seth Stephens-Davidowitz
I’m a huge fan of Asimov’s Foundation series. Hari Seldon, the seminal character in the series, develops psychohistory, an algorithmic science that helps him predict the future of large populations, (not individuals) though in terms of probability. As I read this book, I began to wonder if data would actually help us get to that level at some point.
The premise of the book is that though everybody lies – to their friends, spouse, colleagues and most definitely to themselves, many of their actions – what they search for, what they click on etc – reveal their true nature. With the sheer amount if data that is being generated, data scientists are able to gather insights on our thinking, and potentially use that for the welfare of humanity.
The book uses a bunch of examples early to show how data can help distinguish between what people say and what they actually do. Trivia: India gets called out early enough for being #1 in people who search for “may husband wants me to breastfeed him”! A large section of the first half is full of p*rn data. Reveals much!
I not only got some validations about human behaviour, but also realised that some of my perspectives were not really true. For instance, I had thought that the web was now largely getting segregated into filter bubbles. Data shows otherwise! It also shows the clear possibility that many of our core beliefs and attitudes could be explained by the random year of our birth and what was going on the key years of our upbringing. One observation I could not really agree with was “it does not matter which school you go to.” While one study does show that, I can see it play differently around me, and perhaps there are psychological effects that does not come out in a study. Or it could be affected by “the curse of dimensionality” that the author brings up – if you test enough variables, one, by random chance, will be statistically significant.
The last portion of the book offers a counter balance to the case made for data thus far in the book. The overemphasis on what is measurable, the limits of data, and the ethics of data usage – by private companies or the government.
But the potential of data to cause a social sciences revolution remains well argued. However, just having data is not really enough, one needs to be curious (what data needs to be looked at) and creative (what’s the best way to frame the data or sets of data, build hypotheses) to make the best use of it. Some of what the author has done in the book is precisely that. Can data be misused? Yes, it can, but that’s the risk with every new science. That doesn’t take away from the exciting possibilities it has to offer.