Mining “big data”

An interesting article by Ariana Eunjung Cha on how financiers, politicians, and researchers are mining data from Twitter, Google, Facebook, and the like to identify trends and forecast the future:

From a trading desk in London, Paul Hawtin monitors the fire hose of more than 340 million Twitter posts flying around the world each day to try to assess the collective mood of the populace.

The computer program he uses generates a global sentiment score from 1 to 50 based on how pessimistic or optimistic people seem to be from their online conversations. Hawtin, chief executive of Derwent Capital Markets, buys and trades millions of dollars of stocks for private investors based on that number: When everyone appears happy, he generally buys. When anxiety runs high, he sells short.

Hawtin has seen a gain of more than 7 percent in the first quarter of this year, and his method shows the advantage individuals, companies and governments are gaining as they take hold of the unprecedented amount of data online. Traders such as Hawtin say analyzing mathematical trends on the Web delivers insights and news faster than traditional investment approaches.

The explosion in the use of Google, Facebook, Twitter and other services has resulted in the generation of some 2.5 quintillion bytes each day, according to IBM.

“Big data,” as it has been dubbed by researchers, has become so valuable that the World Economic Forum, in a report published last year, deemed it a new class of economic asset, like oil.

“Business boundaries are being redrawn,” the report said. Companies with the ability to mine the data are becoming the most powerful, it added.

While the human brain cannot comprehend that much information at once, advances in computer power and analytics have made it possible for machines to tease out patterns in topics of conversation, calling habits, purchasing trends, use of language, popularity of sports, spread of disease and other expressions of daily life.

“This is changing the world in a big way. It enables us to watch changes in society in real time and make decisions in a way we haven’t been able to ever before,” said Gary King, a social science professor at Harvard University and a co-founder of Crimson Hexagon, a data analysis firm based in Boston.

The Obama campaign employs rows of people manning computers that monitor Twitter sentiment about the candidates in key states. Google scientists are working with the Centers for Disease Control and Prevention to track the spread of flu around the world by analyzing what people are typing in to search. And the United Nations is measuring inflation through computers that analyze the price of bread advertised in online supermarkets across Latin America.

Many questions about big data remain unanswered. Concerns are being raised about personal privacy and how consumers can ensure that their information is being used fairly. Some worry that savvy technologists could use Twitter or Google to create false trends and manipulate markets.

Even so, sociologists, software engineers, economists, policy analysts and others in nearly every field are jumping into the fray.

via ‘Big data’ from social media, elsewhere online take trend-watching to new level – The Washington Post.

That’s very impressive, to be sure, but do you think all of this “data” is really equivalent to a natural resource?  The stock trader who buys when the Twitter traffic is happy and sells when it’s sad has been making money, but why not buy when people are sad (picking up bargains when people are giving up on the world and dumping their stocks for cheap) and selling when they are happy (taking advantage of their irrational exuberance)?  That is to say, is his data mining resulting in an application that is all that scientific?  And in what sense is a Twitter tweet necessarily equivalent to hard data?  Can one control for irony, sarcasm, and jokes?  I’m not denying that there may be some very useful information amidst all of the clutter, but still. . . .

Hacking into the rest of our technology

One of those darn kids invented a monster.  It is called Shodan.  And it threatens everything connected to the internet, which is now pretty much everything:

It began as a hobby for a ­teenage computer programmer named John Matherly, who wondered how much he could learn about devices linked to the Internet.

After tinkering with code for nearly a decade, Matherly eventually developed a way to map and capture the specifications of everything from desktop computers to network printers to Web servers.

He called his fledgling search engine Shodan, and in late 2009 he began asking friends to try it out. He had no inkling it was about to alter the balance of security in cyberspace.

“I just thought it was cool,” said Matherly, now 28.

Matherly and other Shodan users quickly realized they were revealing an astonishing fact: Uncounted numbers of industrial control computers, the systems that automate such things as water plants and power grids, were linked in, and in some cases they were wide open to exploitation by even moderately talented hackers.

Control computers were built to run behind the safety of brick walls. But such security is rapidly eroded by links to the Internet. Recently, an unknown hacker broke into a water plant south of Houston using a default password he found in a user manual. A Shodan user found and accessed the cyclotron at the Lawrence Berkeley National Laboratory. Yet another user found thousands of unsecured Cisco routers, the computer systems that direct data on the networks.

“There’s no reason these systems should be exposed that way,” Matherly said. “It just seems ludicrous.”

The rise of Shodan illuminates the rapid convergence of the real world and cyberspace, and the degree to which machines that millions of people depend on every day are becoming vulnerable to intrusion and digital sabotage. It also shows that the online world is more interconnected and complex than anyone fully understands, leaving us more exposed than we previously imagined.

via Cyber search engine exposes vulnerabilities – The Washington Post.

The new religion of Kopimism

A new religion, born of the internet age, is seeking legal recognition:

A Swedish religion whose dogma centers on the belief that people should be free to copy and distribute all information—regardless of any copyright or trademarks—has made its way to the United States.

Followers of so-called “Kopimism” believe copying, sharing, and improving on knowledge, music, and other types of information is only human—the Romans remixed Greek mythology, after all, they say. In January, Kopimism—a play on the words “copy me”—was formally recognized by a Swedish government agency, raising its profile worldwide.

“Culture is something that makes people feel much better and makes people appreciate their world in a different way. Knowledge is also something we should copy regardless of the law,” says Isak Gerson, the 20-year-old founder of Kopimism. “It makes us better when we share knowledge and culture with each other.”

More than 3,500 people “like” Kopimism on Facebook, and thousands more practice its sacred ritual of file sharing. According to its manifesto, private, closed-source software code and anti-piracy software are “comparable to slavery.” Kopimist “Ops,” or spiritual leaders, are encouraged to give counsel to people who want to pirate files, are banned from recording and should encrypt all virtual religious service meetings “because of society’s vicious legislative and litigious persecution of Kopimists.”

Official in-person meetings must happen in places free of anti-Kopimist monitoring and in spaces with the Kopimist symbol—a pyramid with the letter K inside. To be initiated new parishioners must share the Kopimist symbol and say the sacred words “copied and seeded.”

The gospel of the church has begun to spread, with Kopimist branches in 18 countries.

An American branch of the religion was recently registered with Illinois and is in the process of gaining federal recognition, according to Christopher Carmean, a 25-year-old student at the University of Chicago and head of the U.S. branch.

“Data is what we are made of, data is what defines our life, and data is how we express ourselves,” says Carmean. “Forms of copying, remixing, and sharing enhance the quality of life for all who have access to them. Attempts to hinder sharing are antithetical to our data-driven existence.”

About 450 people have registered with his church, and about 30 of them are actively practicing the religion, whose symbols include Ctrl+C and Ctrl+V—the keyboard shortcuts for copy and paste.

via Kopimism, Sweden’s Pirate Religion, Begins to Plunder America – US News and World Report.

We see, of course, what the Kopimists are doing, seeking the legal protections given to religion so that they can pirate music, movies, and the like with impunity.  And when they are prosecuted for internet piracy they can claim religious persecution!

And yet, isn’t this the pattern for the way many people approach religion today?  Their theology is based on what they “like.”  (People don’t like the concept of sin, judgment, and Hell or anything else that would restrict their behavior so they don’t believe in them.)  The Kopimists are simply reasoning backwards, starting with what they like to do and building a religion around it.

What might be some other religions people could construct as a way to justify their bad habits?

How could courts distinguish between these bogus religions and legitimate ones?

The internet strike may have worked

The Wikipedia blackout and other protests on the internet to the SOPA bill may have done some good.   Congressmen, including former sponsors in the House and the Senate, are now running away from the bill.  President Obama has also come out against the bill as it stands, provoking Hollywood moguls to threaten to withdraw their financial support of his campaign.

 

Wikipedia is on strike today

If you try looking something up today on Wikipedia, you won’t be able to.  The ubiquitous online encyclopedia is shutting down as a way to protest the Stop Online Piracy Act (SOPA) currently before Congress.  Other sites, such as Reddit and Boing Boing are also joining the strike.  Google and others will not shut down, but they will put up messages decrying the attempt at internet “censorship.”  Here are some details:

Though the Stop Online Piracy Act has the support from the likes of Hollywood, the music industry, and the U.S. Chamber of Commerce, many Silicon Valley firms say it effectively amounts to censorship. To show their opposition to the bill, some sites are planning a service blackout on Jan. 18. Hayley Tsukayama reports:

Wikipedia, Reddit and Boing Boing are planning to black out their services Wednesday to protest the Stop Online Piracy Act and the Protect IP Act by showing users the bill’s effect on Web companies. These companies object to language in the bills, which are aimed at stopping online piracy on foreign Web sites, that grant the U.S. government the right to block entire Web sites with copyright-infringing content on them from the Internet.

Wikipedia will block all of its English-language pages — the first time since the encylopedia’s 2001 launch that it has ever restricted access to those pages as a form of protest.

“[It’s] a decision that wasn’t lightly made,” the company said on its blog Monday. The decision to take down the free encyclopedia’s English pages was made with the input of 1800 Wikipedia users who voted overwhelmingly in favor of the blackout, according to statement from the Wikimedia Foundation. . . .

via SOPA protests planned by Google, Wikipedia and others on Jan. 18 – The Washington Post.

What would SOPA do?  As I understand it, the target is sites that pirate movies and music.  But what the bill does is to allow for court orders that would actually take down sites–including those from other countries–by delisting the domains and stopping search engines and service providers from accessing them.  From Everything You Need to Know about Congress’s Online Piracy Bills:

At a basic level, SOPA — and its Senate analogue, the Protect IP Act — would enable copyright holders and the Justice Department to get court orders against sites that “engage in, enable, or facilitate” copyright infringement. That could include, say, sites that host illegal mp3s or sites that link to such sites (the revised House bill focuses primarily on foreign sites like, oh, Pirate Bay). Courts could bar advertisers and payment companies such as PayPal from doing business with the offending sites in question, order search engines to stop listing the accused infringers, or even require Internet service providers to block access entirely. The bills contain other provisions, too, like making it a felony to stream unauthorized content online. . . .

Why are tech start-ups so vehemently opposed? These companies have argued that the bills are tantamount to Internet censorship. Rather than receiving a notification for copyright violations, sites now face immediate action — up to and including being taken down before they have a chance to respond. Intermediary sites like YouTube and Flickr could lose their “safe harbor” protections. Nonprofit or low-budget sites might not have the resources to defend themselves against costly lawsuits. And, meanwhile, larger companies like Google and Facebook could be forced to spend considerable time and money policing their millions of offerings each day for offending material.

Do these online piracy bills threaten free speech? Plenty of law professors, including Harvard’s Laurence Tribe, think so. The original version of the bill would have allowed copyright holders to block advertising and payment services for an accused Web site before a judicial hearing even took place. The new version of the House bill would require a hearing first, but, as Julian Sanchez notes, the bill “still makes it far too easy for U.S. corporations to effectively destroy foreign Internet sites based on a one-sided proceeding in U.S. courts.” Other critics have worried that the bill’s language is far too broad, threatening all sorts of potentially benign Internet uses. What’s more, the Electronic Frontier Foundation worries that the bill cracks down on electronic tools to circumvent government blacklists that are essential to human rights activists and political dissidents around the world.

Could the bills actually “break” the Internet? Many tech experts think so. The bills would give courts the power to order rogue sites to be de-listed from the Domain Name System — basically, the Internet’s phone directory. U.S. service providers would be tasked with acting as if the site didn’t exist at all (although the newly revised House bill gives a little bit of flexibility here). A big potential pitfall here is that the Internet is global, and it’s possible that users could seek out foreign DNS servers to access blacklisted sites. Some experts have raised security concerns about this splintering of the Internet’s architecture.

I’m curious who in Congress is pushing for this?  Democrats or Republicans or both?

Do you think this is much-needed protection of intellectual and creative property?  Or are the methods too heavy-handed, with unintended consequences that could damage the internet as a whole?

Land rush for domain names

A host of new domain names are going up on the internet, with unintended consequences:

There’s been a scramble to snap up domain names for the Internet’s newest designation — .xxx — but not necessarily from those you’d expect. Adult sites have reserved their spot in the newly labeled section of the Web, but so have companies, charities, celebrities and politicians.

Try “barackobama.xxx,” “angelinajolie.xxx” or “redcross.xxx” and you’ll find yourself faced with a black screen with gray type stating: “This domain has been reserved from registration.” In other words, someone’s made sure those brand names are protected from the association with porn.

Companies, the rich and famous and regulators in Washington now are worried that the rush to defensively buy Web addresses will only worsen — and grow more costly — as the organization in charge of doling out real estate on the Internet prepares to unleash an infinite number of Web suffixes to add to the familiar .com, .net and .edu. Some experts say the move will change the landscape of the Internet forever.

In January, the Internet Corporation for Assigned Names and Numbers (ICANN), the nonprofit association tasked with managing the Internet’s addresses, known as domain names, will begin taking applications from anyone with $185,000 and a desire to reserve their own suffix on the Web. The group oversaw the launch of .xxx last week. Coming after ICANN’s review process could be .god, .abortion, .sex and .georgetown, as well as thousands of others. .  . .

The expansion of suffixes may also compel anyone with a brand name to buy multiple Web addresses to protect its image and prevent customers from being tricked by artfully misspelled sites. ICANN, for instance, handed over .xxx to ICM Registry, which has been charging $200 to trademark holders for each Web address they want to reserve.

The National Retail Federation, an industry trade group, has sent letters to Congress criticizing the rollout of the domain names for lacking transparency — and for the potential cost. Besides buying Web sites to prevent themselves from being associated with a .xxx or a .sex suffix, companies may have to fork over $185,000 to ICANN, plus legal fees, to control a suffix of their own. Plus they would have to maintain useless domains at a cost of $50,000 to $100,000 annually, the NRF said.

“It’s a little bit like the Oklahoma land rush,” said Mallory Duncan, NRF general counsel. “You come in now and pay a quarter of a million dollars or forever hold your peace. That’s not a prudent way to run a business.”

via ICANN is ready for battle over expansion of Web suffixes – The Washington Post.


CLOSE | X

HIDE | X