Monday, November 13, 2017

Everybody Lies by Seth Stephens-Davidowitz

Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are by Seth Stephens-Davidowitz was a fairly interesting book on a fascinating topic.

The title is in reference to the work of Stephens-Davidowitz as an internet researcher and how what people do in searches more representative of them than what they say about themselves and the introduction includes mention of Google Trends, a tool that notes how frequently a word has been searched for in different locations and at different times. Also, the author writes quite a bit about large data sets, and how they enable someone to be very specific in pinpointing data with particular characteristics, and yet have that data set large enough to still be statistically significant. Also noted about big data sets is the curse of dimensionality, with enough data points, you’re going to get statistical outliers.

One thing I particularly liked from the book was mention of the doppelgänger concept that I've written about a couple of times, and how, given a large enough set of people, you should be able find someone similar to you, your doppelgänger. This idea is noted as working in medicine as well, an example being the site PatientsLikeMe. There's also quite a bit in the book about A/B testing and how data can take the form of words, with particular words used telling a particular story, such as how data can reveal usage in print of "the United States is..." vs "the United States are..." through time after the Civil War.

Another things that stood out to me was mention of how New Data is great in fields where there’s incomplete or outmoded ways and types of data. It's noted how the field of finance advanced enough that there's not much room for innovation, but in opposition to this, the story of Jeff Sedar, champion racehorse evaluator is told. He helped identify future triple crown winner American Pharoah based on the enlarged size of the left ventricle of the heart, with that as a predictor of success, assuming no contradictory data points.

The book brought to mind for me others I found compelling on similar topics and while it not one of my favorites in the area, it was an interesting and fast read.