Mining “big data”

An interesting article by Ariana Eunjung Cha on how financiers, politicians, and researchers are mining data from Twitter, Google, Facebook, and the like to identify trends and forecast the future:

From a trading desk in London, Paul Hawtin monitors the fire hose of more than 340 million Twitter posts flying around the world each day to try to assess the collective mood of the populace.

The computer program he uses generates a global sentiment score from 1 to 50 based on how pessimistic or optimistic people seem to be from their online conversations. Hawtin, chief executive of Derwent Capital Markets, buys and trades millions of dollars of stocks for private investors based on that number: When everyone appears happy, he generally buys. When anxiety runs high, he sells short.

Hawtin has seen a gain of more than 7 percent in the first quarter of this year, and his method shows the advantage individuals, companies and governments are gaining as they take hold of the unprecedented amount of data online. Traders such as Hawtin say analyzing mathematical trends on the Web delivers insights and news faster than traditional investment approaches.

The explosion in the use of Google, Facebook, Twitter and other services has resulted in the generation of some 2.5 quintillion bytes each day, according to IBM.

“Big data,” as it has been dubbed by researchers, has become so valuable that the World Economic Forum, in a report published last year, deemed it a new class of economic asset, like oil.

“Business boundaries are being redrawn,” the report said. Companies with the ability to mine the data are becoming the most powerful, it added.

While the human brain cannot comprehend that much information at once, advances in computer power and analytics have made it possible for machines to tease out patterns in topics of conversation, calling habits, purchasing trends, use of language, popularity of sports, spread of disease and other expressions of daily life.

“This is changing the world in a big way. It enables us to watch changes in society in real time and make decisions in a way we haven’t been able to ever before,” said Gary King, a social science professor at Harvard University and a co-founder of Crimson Hexagon, a data analysis firm based in Boston.

The Obama campaign employs rows of people manning computers that monitor Twitter sentiment about the candidates in key states. Google scientists are working with the Centers for Disease Control and Prevention to track the spread of flu around the world by analyzing what people are typing in to search. And the United Nations is measuring inflation through computers that analyze the price of bread advertised in online supermarkets across Latin America.

Many questions about big data remain unanswered. Concerns are being raised about personal privacy and how consumers can ensure that their information is being used fairly. Some worry that savvy technologists could use Twitter or Google to create false trends and manipulate markets.

Even so, sociologists, software engineers, economists, policy analysts and others in nearly every field are jumping into the fray.

via ‘Big data’ from social media, elsewhere online take trend-watching to new level – The Washington Post.

That’s very impressive, to be sure, but do you think all of this “data” is really equivalent to a natural resource?  The stock trader who buys when the Twitter traffic is happy and sells when it’s sad has been making money, but why not buy when people are sad (picking up bargains when people are giving up on the world and dumping their stocks for cheap) and selling when they are happy (taking advantage of their irrational exuberance)?  That is to say, is his data mining resulting in an application that is all that scientific?  And in what sense is a Twitter tweet necessarily equivalent to hard data?  Can one control for irony, sarcasm, and jokes?  I’m not denying that there may be some very useful information amidst all of the clutter, but still. . . .

About Gene Veith

Professor of Literature at Patrick Henry College, the Director of the Cranach Institute at Concordia Theological Seminary, a columnist for World Magazine and TableTalk, and the author of 18 books on different facets of Christianity & Culture.

  • Klasie Kraalogies

    The idea is valid, but I am not sure of his application. I would actually agree with Gene, in that often a bargain is to be had when people are negative. That is how the Rockefeller fortune was made – buy when everybody is selling, sell when everybody is buying.

    Also, the short-term interests of today’s traders are destroying economic value. Especially when it comes to Natural Resources – that is why gold stocks, and other mining stocks, lagged even when gold prices were high, and profits were good. The traders very often simply do not understand the ins-and-outs of what they are dealing with. And I’m concerned that this would be the case in this scenario as well.

    For instance, on the Chicago Commodities exchange, there used to be 2 separate sessions. One in the morning, after which it shut down so that people could absorb information, follow trends, learn some more etc., followed by an afternoon session. However, recently, they took the break away, meaning mindless trading, and quick, possible reactions (often leading to over-reactions), can now be the order of the day. This is an exampled of what one could call chaotic, or laissez-faire capitalism, in stead of ordered trading – to be free, one needs order. Thus, Ordo-liberalism, to beat my old drum….

  • Klasie Kraalogies

    The idea is valid, but I am not sure of his application. I would actually agree with Gene, in that often a bargain is to be had when people are negative. That is how the Rockefeller fortune was made – buy when everybody is selling, sell when everybody is buying.

    Also, the short-term interests of today’s traders are destroying economic value. Especially when it comes to Natural Resources – that is why gold stocks, and other mining stocks, lagged even when gold prices were high, and profits were good. The traders very often simply do not understand the ins-and-outs of what they are dealing with. And I’m concerned that this would be the case in this scenario as well.

    For instance, on the Chicago Commodities exchange, there used to be 2 separate sessions. One in the morning, after which it shut down so that people could absorb information, follow trends, learn some more etc., followed by an afternoon session. However, recently, they took the break away, meaning mindless trading, and quick, possible reactions (often leading to over-reactions), can now be the order of the day. This is an exampled of what one could call chaotic, or laissez-faire capitalism, in stead of ordered trading – to be free, one needs order. Thus, Ordo-liberalism, to beat my old drum….

  • WebMonk

    As to whether or not they can control for irony, sarcasm and jokes – yes they can. Not perfectly, but pretty darned well. Natural language processing for individual tweets combined with analysis of individuals’ streams of messages can give a very solid indication of irony and sarcasm.

    Is big data “equivalent” to a natural resource? Not quite, but in many ways it behaves like one. It is a “natural product” of the billions of communications being passed around through social media, and so it’s not something that a company has to specifically manufacture. It’s an ocean of data/information that is just sitting there. There are other ways in which it is not like a typical natural resource, but it has enough similarities that it can be treated as such in some ways.

    As to what sort of selling and buying you do based on sentiment analysis, it depends on how fast you are. If you are reacting long after those sentiments are put into action, then you trade differently than if you are able to react before the general sentiment is put into actions.

    Putting trades into action based on social media sentiments is a fast-reaction game. The trader knows the sentiment is moving in a certain direction and probably will be for a while if he was fast enough to catch the trend. So he buys, “knowing” that he is early enough in the trend that it will continue for a while, going higher in price. The same thing is true of selling based on negative sentiments – if he is fast enough, he can get some short positions that will take advantage of the negative sentiment trend.

    If the trader is slow, and only picking up on the trends after they have been happening for a while, then he might want to try a counter-trend trading strategy like Dr. Veith suggested. Maybe.

    Financial trading is a bit of an arms race in this regard – the first couple people to develop good financial sentiment analysis tools will have a significant advantage, but pretty soon everyone develops those tools, and the advantage is lost. Trading firms need will need to continue using sentiment analysis facets to avoid being at a disadvantage, but won’t garner advantages over other firms.

    So they are continually searching for newer and/or faster insights into trading activity and direction.

  • WebMonk

    As to whether or not they can control for irony, sarcasm and jokes – yes they can. Not perfectly, but pretty darned well. Natural language processing for individual tweets combined with analysis of individuals’ streams of messages can give a very solid indication of irony and sarcasm.

    Is big data “equivalent” to a natural resource? Not quite, but in many ways it behaves like one. It is a “natural product” of the billions of communications being passed around through social media, and so it’s not something that a company has to specifically manufacture. It’s an ocean of data/information that is just sitting there. There are other ways in which it is not like a typical natural resource, but it has enough similarities that it can be treated as such in some ways.

    As to what sort of selling and buying you do based on sentiment analysis, it depends on how fast you are. If you are reacting long after those sentiments are put into action, then you trade differently than if you are able to react before the general sentiment is put into actions.

    Putting trades into action based on social media sentiments is a fast-reaction game. The trader knows the sentiment is moving in a certain direction and probably will be for a while if he was fast enough to catch the trend. So he buys, “knowing” that he is early enough in the trend that it will continue for a while, going higher in price. The same thing is true of selling based on negative sentiments – if he is fast enough, he can get some short positions that will take advantage of the negative sentiment trend.

    If the trader is slow, and only picking up on the trends after they have been happening for a while, then he might want to try a counter-trend trading strategy like Dr. Veith suggested. Maybe.

    Financial trading is a bit of an arms race in this regard – the first couple people to develop good financial sentiment analysis tools will have a significant advantage, but pretty soon everyone develops those tools, and the advantage is lost. Trading firms need will need to continue using sentiment analysis facets to avoid being at a disadvantage, but won’t garner advantages over other firms.

    So they are continually searching for newer and/or faster insights into trading activity and direction.

  • Klasie Kraalogies

    Webmonk – true enough. But I’ve seen enough trading, press releases, and up’s and down’s in the market to know that it would really help the traders to try and understand WHAT they are dealing with, instead of just HOW people are thinking. A market run on emotions seems to be the last hurrah of a free market economy.

  • Klasie Kraalogies

    Webmonk – true enough. But I’ve seen enough trading, press releases, and up’s and down’s in the market to know that it would really help the traders to try and understand WHAT they are dealing with, instead of just HOW people are thinking. A market run on emotions seems to be the last hurrah of a free market economy.

  • WebMonk

    Yes, the HOW people are thinking is a (relatively) simple thing to discover – the low-hanging fruit is always picked first. When the low-hanging fruit gives traders good returns there is little incentive to work other, more difficult, angles. And as emotions and instant-trends become more dominant, there is less and less incentive to use other analysis tools.

    And it gets even more depressing – why did modern trading on trending emotions and broad-swath sentiment analysis work at all? I would posit that most trading markets have run primarily on emotion for the last two hundred years.

    You can look at the history of trading, and see clear examples of emotionally-driven trading by broad swathes of the trading population all the way back in the early 1800s. Emotion-driven markets aren’t a suddenly developed thing, they’re a spectrum going back hundreds of years as emotion slowly becomes more and more influential.

    As soon as trades happen faster than X, emotions will be major influences, and as things happen faster and faster, emotions become more and more influential, at least in the short term. And the “X” isn’t particularly fast. Any decision that needs to be made without research relies, by necessity, on an individual’s existing knowledge and on-the-spot decision making process. Those decisions are ALWAYS heavily influenced by emotion. That “X” was crossed a long time ago.

    So, all that being said, I would disagree with your last sentence – a market run on emotions has been the reality for two hundred years, so probably isn’t a “last hurrah”.

    As an interesting other-effect (not necessarily “counter”, just other), the relatively recent development of computerized trading algorithms avoid emotion-influenced trading, except that the computerized analysis take identifiable emotional trends into account.

  • WebMonk

    Yes, the HOW people are thinking is a (relatively) simple thing to discover – the low-hanging fruit is always picked first. When the low-hanging fruit gives traders good returns there is little incentive to work other, more difficult, angles. And as emotions and instant-trends become more dominant, there is less and less incentive to use other analysis tools.

    And it gets even more depressing – why did modern trading on trending emotions and broad-swath sentiment analysis work at all? I would posit that most trading markets have run primarily on emotion for the last two hundred years.

    You can look at the history of trading, and see clear examples of emotionally-driven trading by broad swathes of the trading population all the way back in the early 1800s. Emotion-driven markets aren’t a suddenly developed thing, they’re a spectrum going back hundreds of years as emotion slowly becomes more and more influential.

    As soon as trades happen faster than X, emotions will be major influences, and as things happen faster and faster, emotions become more and more influential, at least in the short term. And the “X” isn’t particularly fast. Any decision that needs to be made without research relies, by necessity, on an individual’s existing knowledge and on-the-spot decision making process. Those decisions are ALWAYS heavily influenced by emotion. That “X” was crossed a long time ago.

    So, all that being said, I would disagree with your last sentence – a market run on emotions has been the reality for two hundred years, so probably isn’t a “last hurrah”.

    As an interesting other-effect (not necessarily “counter”, just other), the relatively recent development of computerized trading algorithms avoid emotion-influenced trading, except that the computerized analysis take identifiable emotional trends into account.

  • http://www.facebook.com/mesamike Mike Westfall

    The difference between information and natural resources is that natural resources can be transformed into actual wealth, such as houses, yachts, dishwashers, cars, network infrastructure, etc. All information can do is maybe inform you of which of those durable goods might be in demand or not. You still need the actual natural resources, plus labor to create wealth.

  • http://www.facebook.com/mesamike Mike Westfall

    The difference between information and natural resources is that natural resources can be transformed into actual wealth, such as houses, yachts, dishwashers, cars, network infrastructure, etc. All information can do is maybe inform you of which of those durable goods might be in demand or not. You still need the actual natural resources, plus labor to create wealth.

  • Steven Bauer

    Perhaps Douglas Adams wasn’t so far off after all.

  • Steven Bauer

    Perhaps Douglas Adams wasn’t so far off after all.

  • Michael B.

    @WebMonk

    “As to whether or not they can control for irony, sarcasm and jokes – yes they can. Not perfectly, but pretty darned well. Natural language processing for individual tweets combined with analysis of individuals’ streams of messages can give a very solid indication of irony and sarcasm.”

    I studied NLP in a course in college, and I was amazed at how easy it was to throw off a program. Just to translate something in another language was difficult. Consider simple idioms, like the phrase “working around the clock” — you translate that into French and you get a ridiculous result.

  • Michael B.

    @WebMonk

    “As to whether or not they can control for irony, sarcasm and jokes – yes they can. Not perfectly, but pretty darned well. Natural language processing for individual tweets combined with analysis of individuals’ streams of messages can give a very solid indication of irony and sarcasm.”

    I studied NLP in a course in college, and I was amazed at how easy it was to throw off a program. Just to translate something in another language was difficult. Consider simple idioms, like the phrase “working around the clock” — you translate that into French and you get a ridiculous result.

  • WebMonk

    Michael B. – I work exactly with NLP in my job. I don’t know how long ago your college was or what sort of software you were using, but I suspect things have moved a long way since then.

    Machine translation is always a tricky thing, but top-notch programs are getting very, very good.

    Like I said, it is definitely possible to fool them, but it takes purposeful effort to fool good NLP systems. In the sort of data we work with (social media and media) the people aren’t writing their statements with the design of throwing off NLP systems, and so it comes across pretty well.

    We’ve done several tests of our systems – pull in a random sampling for a few hundred messages and analyze them manually. The modern NLP systems give accurate results well over 95% of the time. With foreign languages it dropped to the 80s, but that is still good enough to be extremely useful – we just realize that there are error bars on the confidence of the sentiment analysis results.

  • WebMonk

    Michael B. – I work exactly with NLP in my job. I don’t know how long ago your college was or what sort of software you were using, but I suspect things have moved a long way since then.

    Machine translation is always a tricky thing, but top-notch programs are getting very, very good.

    Like I said, it is definitely possible to fool them, but it takes purposeful effort to fool good NLP systems. In the sort of data we work with (social media and media) the people aren’t writing their statements with the design of throwing off NLP systems, and so it comes across pretty well.

    We’ve done several tests of our systems – pull in a random sampling for a few hundred messages and analyze them manually. The modern NLP systems give accurate results well over 95% of the time. With foreign languages it dropped to the 80s, but that is still good enough to be extremely useful – we just realize that there are error bars on the confidence of the sentiment analysis results.

  • Contact

    Recent studies have revealed that decision making is one of the most complicated business processes. This process is usually undertaken by managers. Rarely does this process take into account the views of junior workers in an organization. …

  • Contact

    Recent studies have revealed that decision making is one of the most complicated business processes. This process is usually undertaken by managers. Rarely does this process take into account the views of junior workers in an organization. …


CLOSE | X

HIDE | X