Drinking through a firehose: has data mining gone too far?

In a world of endless information sharing, consumers have become the product. Platforms such as Google, Facebook, Foursquare and Twitter are the new factory floor, and online users, who leave digital crumbs as they browse the Web and tap into social networks, generate data that can be bought and sold. Every tweet tweeted, badge unlocked, Website searched and "Like" button clicked adds to the growing inventory of user information. Data miners then sort it, package it, market it-and companies use it to better target customers.

163

“Even traditional companies have discovered they can generate totally new lines of business by collecting and using their customers’ information,” says Andrea Matwyshyn, professor of legal studies and business ethics at Wharton. From grocers to gas stations, retailers offer loyalty cards that track purchases. Visa has filed a patent for a method to deliver targeted online ads to consumers, based in part on their offline credit card spending. And through social media Websites like Foursquare, people bring information about their offline spending back into the online world.

Privacy advocates say that the collection of data has gone too far, exploiting consumers who have less and less control over how their personal information is doled out. Recently, the drumbeat for protection of digital privacy has been growing louder.

Federal Trade Commission chairman Jon Leibowitz in October denounced online data collectors as “cyberazzi” and called for a “Do Not Track” mechanism that would help consumers better control the online information they share. Facebook is nearing a settlement with the U.S. government over charges that it made “material retroactive changes” to its privacy policies in December 2009, rendering information about subscribers public by default without their consent.

Individuals and groups have sued tech firms, search engines and social media companies, alleging privacy violations, and lawmakers have introduced several pieces of legislation to make data collection more transparent.

Others argue that the data trove breaks down barriers and opens doors, giving companies unprecedented insight into what customers want and helping them deliver more of what consumers need. According to Anindya Ghose, co-director of New York University’s Center for Digital Economy Research, consumers are benefiting more than ever from the free flow of information. “Companies can use my interaction with my friends on Facebook to personalize and highly customize products and services for me,” he says. Such personalization would take into account not just one individual’s Web-browsing behavior, but also the behaviors of connections within that person’s social network.

Moreover, “social media is democratizing marketing,” Ghose argues, because the viral spread of information is forcing companies to communicate better with their customers and is giving consumers a greater voice. The richer this two-way communication, the better companies can meet consumer demand. “There is very little explicit selling of data going on. It’s more like the data is out there; it’s stored and assembled, and it’s ready for intermediaries who can use it effectively. I think the right word is that there is a lot of sharing of information.”

That’s good news for companies that cater to online behavioral advertisers, who attempt to target ads based on browsing habits, interests and shopping behavior.

Internet advertising is growing faster than advertising in any other medium—an average of 14.6 percent per year, according to ZenithOptimedia, a media services agency. Online advertising will overtake newspapers to capture 18.9 percent of the global ad market by 2013, becoming the world’s second largest advertising medium next to television, the agency forecasts.

In 2011, Facebook will likely amass $3.8 billion in global ad revenues, predicts digital intelligence firm eMarketer. The social media giant now gathers data from more than 800 million active users who post in 70 languages. And search giant Google, which pulled in $9.72 billion in advertising revenues in the third quarter alone, makes 96 percent of its revenues from advertising. In regulatory filings, Google attributes its healthy profit to “the relevance and quality of both our search results and the advertisements displayed.”

Old hat, new color

Viewing consumer information as a product is nothing new, says Wharton marketing professor Jonah Berger, who calls the discussion about online data mining “an old hat in a new color.” Radio, television, magazines and newspapers have long leveraged their audiences to woo advertisers. For years, direct marketers and catalogue companies assembled phone and mailing lists that could be sold. “Companies have been mining consumer data for a long time,” notes Berger. Today, he adds, there is simply more data.

The data keeps expanding as increasing numbers of online users forego privacy in exchange for social connections. A June 2011 survey by Consumer Reports National Research Center found that 34 percent of Facebook users shared their full birth date online and 21 percent shared their children’s names and photos. Roughly one in five didn’t bother using Facebook’s privacy controls.

“Consumers expect personalization and expect to be social and share information, often in public, with others online,” says Shawndra Hill, Wharton operations and information management professor. “As target marketing and recommendation engines become more a part of our daily experience, users are becoming more aware that their personal and behavioral information is being used by firms for advertising. Some people care about their digital privacy. But it is often the case that consumers are willing to give up their data liberties in return for something useful, desirable and assumed free. So in essence, consumers pay for services with personal data.”

A vast ecosystem of companies has evolved to capture, aggregate, manage and distribute that digital data. Consumers leave information about their online behavior through cookies-small text files sent by a Website’s server and stored on the user’s computer or mobile device. Cookies enable Websites to recognize the browser on return visits, making it possible for users to store profile information on favorite sites and maintain homepage preferences. Cookies also may capture behavioral information, which data aggregators, ad agencies and ad networks use to develop targeted advertising. A single Website often has relationships with a range of third parties that collect cookie data for various purposes.

The online advertising industry says it does not buy and sell information that identifies individuals personally, such as names or email addresses, but aggregates anonymous data to build profiles of consumer groups. “The good actors in the industry do not trade in personally identifiable information,” notes Sherrill Mane, senior vice president of industry services at the Interactive Advertising Bureau, an association of more than 500 media and technology companies that churn out 86 percent of online advertising in the United States. The objective is to develop more robust audience segmentation, she says, “not to get people’s names and hound them.”

That doesn’t mean other parties couldn’t use digital data to target individuals, according to Lee Tien, a senior staff attorney at the Electronic Frontier Foundation, a San Francisco nonprofit that works to defend digital rights. “Advertisers are only interested in us as potential consumers and buyers, but there’s very rich data out there. Other agencies might be more interested in you more personally,” he says, pointing out that many popular Websites host an average of 65 third-party trackers. Employers or insurance companies, for example, could potentially deny applicants employment or health insurance based on information they find in the data collected. “It’s not the advertising side of it” that is concerning, he notes. “We worry about people being tracked because we don’t know what that data is going to be used for.”

Creep-out factor

Studies show most online users suspect their behavior is being tracked, but either don’t care or don’t know how to prevent it. A 2011 survey by New York-based market research firm Harris Interactive and TRUSTe, a company that helps clients such as Facebook, Microsoft and Apple comply with privacy requirements, found that 30 percent of consumers believe advertisers are obtaining personal information such as phone numbers, email and physical addresses without their consent. More than half (52 percent) assumed that their online browsing behavior had been shared with advertisers without their approval.

Yet while 94 percent of consumers consider online privacy important, only 37 percent consistently take steps to protect their personal information online, the survey also found. Only one in four regularly opt out of online tracking, and less than one in five (19 percent) said they had ever downloaded and used a tracking blocker product.

Despite the growing awareness of online tracking, there’s a “creep-out factor” when consumers see an advertisement on one site that reflects browsing on another, especially if the computer user “had no idea the two Websites were talking to each other,” says Matwyshyn. “There’s a broader feeling from consumers of progressively losing control over their digital identities and information.” The growing tension stems in part from “a world-view conflict that’s happening between privacy-conscious consumers and radical, transparency-supporting entrepreneurs,” she notes. Tech entrepreneurs start off with a different mindset than the average consumer: They believe unlimited information sharing is good. Since computer code tends to carry the values of its creators, they naturally build their views into their platforms. “Without full disclosure of how information is being shared, most consumers won’t be able to understand how the code works on the Websites they’re using, and will feel that they are essentially being unfairly turned into a product.”

Unwieldy, non-negotiable user agreements don’t help. Many online agreements allow companies to alter contracts even after the user has signed, giving consumers little real choice about how their data is shared. “I don’t know anyone who can take the time to read a 58-page agreement on a 3-inch smartphone screen, even assuming they have spectacularly good eyesight,” Matwyshyn says. Meanwhile, the industry is pushing self-regulation. The Digital Advertising Alliance, a coalition of advertisers and media companies, is adopting a set of principles and has developed an “Ad Choices” icon that can be displayed on sites to signify to consumers that they have the ability to opt out of targeted ads.

Some doubt whether opt-outs will work. A recent study from Carnegie Mellon titled, “Why Johnny Can’t Opt Out: A Usability Evaluation of Tools to Limit Online Behavioral Advertising,” showed that consumers who tried to limit tracking with a variety of tools were unable to understand them well enough to make them work. “Our results suggest that the current approach for advertising industry self-regulation through optout mechanisms is fundamentally flawed,” the study notes. “Users’ expectations and abilities are not supported by existing approaches. Even with additional education and better user interfaces, it is not clear whether users are capable of making meaningful choices about trackers.”

Back to the customer

Wharton marketing professor Peter Fader believes much of the controversy around data collection stems from sloppy marketing.”Taking the data too literally draws misleading conclusions and puts out messages that are not only ineffective but almost creepy,” says Fader, co-director of the Wharton Customer Analytics Initiative. “All of this fuss about privacy would go away if companies could learn the lessons of history.”

In the past, data was scarce, so companies carefully analyzed what they had, Fader notes. Firms like Pillsbury and Procter & Gamble used in-house experts to track repeat purchases, augmenting what they found with demographic data and results from expensive surveys. Direct marketers compiled lists that they tested and re-tested. Insurance companies learned how to classify their customers into high and low-risk groups. “Companies were very shrewd with their data because it was so hard to get,” says Fader. “Today, it’s drinking from a fire hose. We have everything, a million variables and the data miners are trying to use all of it.”

In the rush to collect and connect, he adds, the data itself has become the focus- not the customer. Customers make many online purchases for random reasons that don’t contribute to a meaningful profile, Fader argues. “How granular should our messaging get? The one-to-one is going too far. Just because we have two or three data points on a customer doesn’t mean we’ve nailed them down.”

But the world is moving toward increasing segmentation, not less, says Joseph Turow, a professor and associate dean for graduate studies at the University of Pennsylvania’s Annenberg School for Communication. Author of a forthcoming book, The Daily You: How the New Advertising Industry Is Defining Your Identity and Your Worth, Turow sees a world where not only advertising, but also other forms of content, will be targeted to consumers based on their data profile. “Eventually as we move forward, news and entertainment will be varied, even on television, based upon the constructions of consumers that advertisers, marketers and publishers have built,” he notes. “We’re only at the beginning of this transformation.”

Standard arguments about privacy miss the point, Turow adds. People usually focus on personally identifiable information such as names and addresses, and feel their privacy has been violated only if that type of data is shared. But even if they remain anonymous, Turow points out, “people are having their reputations constructed. If a company knows 100 data points about me in the digital environment, and that affects how that company treats me in the digital world, what’s the difference if they know my name or not?”