Today we discuss a book review of "Data-ism" by Steve Lohr - a Harper Collins publication. Technology has an unimaginable capacity to generate data. This raises concerns about the idolatry of data and the consequent replacement of wisdom with quantification - a phenomenon named post-humanist. The author includes description of the possibilities of big data.
He starts off with an example of McKesson corporation which distributes one third of all pharmaceuticals in the US to 26000 customer locations including roughly 240 million pills a day.
The author takes this example because the company accumulates data (pills, prices and shipment miles) which is plentiful, stable and reliable. This has enabled IBM to build a sort of flight simulator for decision making. Presumably the author is talking about online data and data warehousing and goes to add that this capability works in two ways - one to provide profit and loss figures for every product, supplier and another to provide a tool for analytics and prediction.
An outcome of such analytics has been to centralize the distribution of very expensive drugs. Centralization requires expensive air shipments to customers. However, IBM's modeling software predicted that savings in inventory levels for certain drugs would make up the higher air freight expenses. McKesson tested this with a pilot project and the proposal was vindicated. The software gave McKesson the clarity and confidence to go ahead. This is an example where quantification trumps "best guesses, gut feel, experience and intuition". This replacement of "wisdom" can cause concerns.
The author can be credited with not only highlighting the benefits but as a journalist striving to see all sides of the issues. As an example, he cites Tom Mitchell, the chairman of machine-learning department at Carnegie Mellon who gave an example involving two sentences:
The girl caught the butterfly with the spots.
The girl caught the butterfly with the net.
where humans can understand that girls don't have spots and butterflies don't have nets but machines are missing what can be termed as "context"
The role of context can be understood with say the number 39 which means nothing but add Celsius and it means temperature and hot and add a person's name and it means illness.
Context today is said to be achieved in two ways - correlation and association. Fortunately correlation is not new and data mining has helped here. For example, Walmart found out about ten years earlier that consumers stocked up on strawberry pop tarts and beer before a hurricane.
Another example given is that of Zest Finance that reduces the risk of payday borrowers by including data points such as whether the borrower has a cell phone and if he or she types their names in all upper case. Although it works, not knowing "why" there is a correlation brings up a debate. Lohr says that the authors of Big Data insist that big data overturns the self congratulatory illusion that comes with identifying causal mechanisms.
Data enthusiasts say the why can be answered by pairing models with measurements. However measurements seems to have led us to say the housing financial crisis where data was analyzed that was plentiful and ignored financial crises, when data sets were sparse and messy.
Therefore correlation alone is not sufficient. But intuition has its own problems. The author cites the example from Daniel Kahneman's - "Thinking, Fast and slow" where participants asked if a man described as "meek" will be a librarian or a farmer. While the vast majority replied "librarian", data showed that there are twenty times more farmers than librarians.
Since both man and machine have weaknesses, the authors say that both can help each other remove their blindspots. However, he warns against froming a habit where instead of computers assisting humans, humans may assist computers
Another example of a frontline of this interdependence is a medical center where doctors and data driven software are playing "an information game". Doctors no longer reign supreme over monitoring. Lohr points out that the two can augment each other.
He starts off with an example of McKesson corporation which distributes one third of all pharmaceuticals in the US to 26000 customer locations including roughly 240 million pills a day.
The author takes this example because the company accumulates data (pills, prices and shipment miles) which is plentiful, stable and reliable. This has enabled IBM to build a sort of flight simulator for decision making. Presumably the author is talking about online data and data warehousing and goes to add that this capability works in two ways - one to provide profit and loss figures for every product, supplier and another to provide a tool for analytics and prediction.
An outcome of such analytics has been to centralize the distribution of very expensive drugs. Centralization requires expensive air shipments to customers. However, IBM's modeling software predicted that savings in inventory levels for certain drugs would make up the higher air freight expenses. McKesson tested this with a pilot project and the proposal was vindicated. The software gave McKesson the clarity and confidence to go ahead. This is an example where quantification trumps "best guesses, gut feel, experience and intuition". This replacement of "wisdom" can cause concerns.
The author can be credited with not only highlighting the benefits but as a journalist striving to see all sides of the issues. As an example, he cites Tom Mitchell, the chairman of machine-learning department at Carnegie Mellon who gave an example involving two sentences:
The girl caught the butterfly with the spots.
The girl caught the butterfly with the net.
where humans can understand that girls don't have spots and butterflies don't have nets but machines are missing what can be termed as "context"
The role of context can be understood with say the number 39 which means nothing but add Celsius and it means temperature and hot and add a person's name and it means illness.
Context today is said to be achieved in two ways - correlation and association. Fortunately correlation is not new and data mining has helped here. For example, Walmart found out about ten years earlier that consumers stocked up on strawberry pop tarts and beer before a hurricane.
Another example given is that of Zest Finance that reduces the risk of payday borrowers by including data points such as whether the borrower has a cell phone and if he or she types their names in all upper case. Although it works, not knowing "why" there is a correlation brings up a debate. Lohr says that the authors of Big Data insist that big data overturns the self congratulatory illusion that comes with identifying causal mechanisms.
Data enthusiasts say the why can be answered by pairing models with measurements. However measurements seems to have led us to say the housing financial crisis where data was analyzed that was plentiful and ignored financial crises, when data sets were sparse and messy.
Therefore correlation alone is not sufficient. But intuition has its own problems. The author cites the example from Daniel Kahneman's - "Thinking, Fast and slow" where participants asked if a man described as "meek" will be a librarian or a farmer. While the vast majority replied "librarian", data showed that there are twenty times more farmers than librarians.
Since both man and machine have weaknesses, the authors say that both can help each other remove their blindspots. However, he warns against froming a habit where instead of computers assisting humans, humans may assist computers
Another example of a frontline of this interdependence is a medical center where doctors and data driven software are playing "an information game". Doctors no longer reign supreme over monitoring. Lohr points out that the two can augment each other.
No comments:
Post a Comment