The sinister side of big data

Hopes are high for big data. GE declares in an online video that the industrial Internet, a.k.a. the Internet of Things, will bring us “a faster, safer, cleaner, more productive world. And it will be greater than what we’ve ever done before.”


But there is also a growing awareness that important concerns have to be addressed if these hopes are to be realized. If the four V’s —volume, velocity, variety and verification —define what big data is, then four P’s— practicality, privacy, power and privilege— define the hurdles that big data must clear in the race to achieve a sustainable future.

Practical challenges are the ones likely to be solved soonest. The primary practical issue to emerge from the conference on “Sustainability in the Age of Big Data,” hosted by Wharton’s Initiative for Global Environmental Leadership (IGEL), is, ironically, a lack of brainpower.

As Paul Rogers, chief development officer at GE, said in his closing presentation at the IGEL conference, “big data exists today in a way that is extremely difficult to understand.” Since much of the industrial Internet data is specific to particular types of machinery, it is often intelligible only to those who designed and built the equipment. It takes deep expertise to use such data to solve problems and find efficiencies. And it requires the additional expertise of computer scientists and others to create software that can render such data useful to non-experts in the future.

The immediate concern is that there simply are not enough experts—engineers, big data analysts and computer scientists—to cope with the huge amount of data that is rapidly accumulating. With the right expertise, big data can be used to dramatically increase efficiency, enhancing both sustainability and commercial value. But as Alyssa Farrell, director of global sustainability at SAS, said at the Wharton conference, “In order to capitalize on opportunities, companies need more analytical talent in the pipeline.”

According to Rogers, “The question is not, How do we generate more data? The question is, Is most of the data we have being used for anything meaningful? And the answer is no.”

Also speaking at the conference, Mark Headd, chief data officer for the city of Philadelphia, pointed to other real-world barriers to the release of data. Much of the historic government data that exists, he pointed out, is inconsistent and incompatible with current databases. “Most of these systems were never designed to release data external to government,” he said, “so you need a bridge between the legacy environment and the data environment.”

And in government, as in business, concerns about the quality of data often mask control issues. The fact that information is stored in silos guarded by employees who don’t want to give up control makes the job harder, Headd said. Department heads, for example, often resist directives to release city data, objecting that the data is not “clean, up to date or suitable for release.” According to Headd, “Getting over the apprehension that data is messy is a real obstacle—there’s entropy involved.”

Another practical issue: How costly and cumbersome it currently is to transmit huge amounts of data wirelessly. The cost is likely to come down as big data applications increase and new technology is developed, but for now the terabyte of data generated by jet engines during a flight has to be downloaded by a technician who connects the onboard system to computers on the ground after the plane lands. The problem, says Rogers, is that the wireless “transfer of that data is extremely expensive.”

Privacy concerns

Privacy concerns are all too familiar in the popular press. There have been frequent reports about the U.S. government engaging in massive electronic surveillance of its own citizens and of foreign governments hacking into supposedly secure government and corporate systems.

The New York Times reported recently, “A Russian crime ring has amassed the largest known collection of stolen Internet credentials, including 1.2 billion user name and password combinations and more than 500 million email addresses.” This after Eastern European hackers stole 40 million credit card numbers from Target and Vietnamese data thieves got away with “as many as 200 million personal records, including Social Security numbers, credit card data and bank account information from Court Ventures, a company now owned by the data brokerage firm Experian.”

Privacy and security are also concerns in the world of sustainability. David Parker, vice president for big data at SAP, said, “Obviously, data privacy is the biggest big-ticket issue, and big data sharing can be undertaken for the greater good, or with wrong intentions.” He said that SAP lobbying of government regulators aims to allow a greater access to and use of data, but with an understanding that lines need to be drawn.

Potential abuses

The power of big data to advance commerce and sustainability can also be abused.

In one example of the concerns about how big data will be used, the Farm Bureau Federation is pushing for tighter controls on the use of data that farmers supply to companies they work with. According to Farm Bureau economist Matt Erickson, the worry is that groups opposed to specific practices, such as the use of GMOs, will gain access to supposedly anonymous data, tie them back to specific farms—just as hackers recently linked anonymous Netflix data to specific customers—and use the data against individual farmers.

Michael Lewis wrote a bestseller, Flash Boys, about how high-speed traders illegally profited by shaving a few milliseconds off the length of time it took data to transmit from New York to New Jersey. Nothing so high-tech is suspected in commodity markets, but Erickson is concerned that big data from farmers could be used to manipulate those markets.

Companies with massive amounts of data about everything from fertilizer use to crop yields could use such information to play the market. “If I had all that data I could easily predict the market,” says Erickson. “It hasn’t happened, but without question it could happen.”

There simply are not enough experts—engineers, big data analysts and computer scientists—to cope with the huge amount of data that is rapidly accumulating.

Other, subtler abuses of big data are also possible. During his conference-opening keynote, Parker related a hypothetical use of customer data that is now possible using data gleaned from a retailer’s website and a customer’s mobile phone. The retailer, said Parker, could send him a text about a shirt he was looking at online, saying, “Mr. Parker, we now have that shirt in your color, in your size, in a branch local to you; and we understand that you’re only a two-minute walk away from that branch.” The retailer might go on to use Real Time Offer Management (RTOM) to follow up this message with a text offering a $5 discount if the purchase were to be made within the next 20 minutes.

This service benefits the retailer, the customer and the environment (no packaging, no shipping and no car trip to the local mall), but as Parker noted in passing, it can seem “a little bit Big Brotherish.”

While the example Parker offered was an “opt-in/opt-out” service, there is the potential for such strategies to be exploited without permission and to move from serving customers into manipulating them—pushing them to buy or use more than they otherwise would, for example.

As CMO of big data company Syncsort and IGEL senior fellow Gary Survis indicated in an IGEL blog, “Clearly we are embarking on a journey to a new era where there will be an epic battle between those that will use data for good and those that will seek to control it for evil purposes.”

The danger of manipulation

A related concern surfaced around the idea of using big date to motivate sustainable behavior. Speaking about “gamification,”  Wharton legal studies and business ethics professor Kevin Werbach said games can be used to encourage R&D (a company is likely to generate a lot more research by announcing a competition to invent a more sustainable light bulb, for example, than by simply publishing an RFP).

In a similar way, municipalities can increase recycling rates by making the activity into a kind of game: the town tracks how much a resident recycles and awards points that ultimately lead to a prize of some sort. But one of the dangers is that such strategies can be used to motivate people in unethical ways.

As Werbach noted, “It’s easy to use gamification to be manipulative. Do this because it’s fun, when there’s really some objective that does not necessarily coincide with the player’s interests. So it’s critical in ethical gamification design to be transparent about those objectives.” The challenge facing gamification is how to ensure that the power of big data is used to support and not coerce targeted behavior. “It’s really important to long-term success,” said Werbach, “that people participating feel it’s in their best interests and understand the nature of the system, as opposed to it being done without their knowledge.”

Privileged access

Privileged access to big data is one of the most difficult challenges facing those in the sustainability space. As Rogers noted, commerce and sustainability both benefit from efficiency. But in many areas of the world, commerce is sparse and markets are too weak to attract serious investment. Yet efficiency and sustainability are even more critical in these areas than they are in the developed world, not simply as ways to improve life, but literally to sustain it.

Virtually all the population growth predicted in the coming decades will take place in developing areas where food and energy are desperately needed, and where big data could play a vital role. The ultimate challenge is ensuring that the high hopes for big data are realized on a global scale.

It is only natural for difficulties to surface once the initial enthusiasm for a new concept peaks. The Hype Cycle calls it the “trough of disillusionment” that follows on the heels of “inflated expectations.” The issues of practicality, privacy, power and privilege that are now being raised about big data are a useful antidote to those inflated expectations, and once they are resolved will lead, in all likelihood, to greater enlightenment and ultimately to a more sustainable world.

Source: Knowledge@Wharton