If you use the Internet or are one of the 3 billion people in the world who use a smartphone, you are generating data.

Every use of an app that provides directions, every phone call, every email and even how long you spend on social media is tracked and becomes data valuable to someone. Businesses pay social media apps such as Facebook and Instagram for data that reveals users' personal interests so they can place relevant ads in your feeds. For example, home chefs will see ads for kitchen tools and cookware, meal delivery services and the like, and those businesses make money when the user simply scrolls by the ad and even more if the user clicks through to learn more about the product or buy it. While this may be the new way to do business, it raises concerns of privacy and ethics. Who owns the data? What can others do with my personal data? How can I protect my data? Since this technology changes almost daily, the answers to these questions and many others remain unclear.


Data Machines

5 billion people have a mobile connection*
9: number of apps used per user per day**
30: number of apps used per user per month**
* VentureBeat 2017
** techcrunch 2016

wearable health and fitness trackers have jumped in use among U.S. consumers***
9% in 2014
33% in 2018
*** Business Insider


A new industrial revolution

The exponential increase in computer-processing power, speed and storage, as well as the massive volume of data being created every second by millions of people and their devices, is described as another industrial revolution. From simple Google searches to online shopping to the 50 million smart speakers in use in the U.S., such as Amazon's Alexa and Echo and Google Home, each of these creates data. Some of this data creates benefits to society, such as access to nearly any kind of information from anywhere and improving health care, but also has a dark side: Cybercriminals can use your data to open credit card accounts or learn your running route. In addition, smart home devices generate data about your living habits, such as what times of day you use the most electricity and water, which gives clues about when you are at home and when you are not. Late last year, fitness app Strava discovered that its global heat map designed to show where it had the most users also revealed the locations of U.S. military bases worldwide, putting the military and those operations at risk.

“This is surveillance,” he said at the conference. “And these stockpiles of personal data serve only to enrich the companies that collect them. This should make us very uncomfortable. It should unsettle us.”

At the 40th International Conference of Data Protection and Privacy Commissioners in Brussels last October, Apple CEO Tim Cook called for digital privacy laws in the United States to regulate the personal data companies collect from users. Cook said our personal information is being "weaponized against us with military efficiency," which has severe consequences.

Ironically, Apple has a 14 percent worldwide market share of smartphones and holds one of the world's largest repositories of data. But there are concerns about how the tech giants use their data and how it is secured.

In April, Facebook CEO Mark Zuckerberg testified before Congress for five hours after British political consulting firm Cambridge Analytica gained access to nearly 87 million users' data from a Facebook app in an effort to sway political views. Zuckerberg told Congress that Facebook does not sell data to advertisers but allows advertisers to tell Facebook who they want to reach, then Facebook places ads in front of users who would be interested based on their personal data. In late November, news media reported that Facebook had discussed charging developers $250,000 annually for access to its platform application programming interfaces (API) to make apps that can ask users for access to their data. Facebook denied the claims. In mid-December, Facebook revealed it had given developers too much access to its users' photos for six days in September, affecting about 6.8 million users.

In October, Google revealed that between 2015 and March 2018, a software bug potentially exposed Google+ user information, such as name, email address, occupation, gender and age, to developers. Google later revealed that it chose not to inform its Google+ members fearing reputation damage and would only estimate the number of users potentially affected to be 500,000. But several weeks later, it revealed that an additional bug had exposed user data from about 52.5 million accounts for about six days.

On Nov. 30, in what may be one of the largest known data breaches, hotel chain Marriott revealed that up to 500 million guests at its Marriott Starwood hotels worldwide may have had their personal identifying information compromised. Since it was a global compromise, it also could be the first breach under the European Union's General Data Protection Regulation (GDPR), which establishes rules for how companies manage and share personal data and levies a substantial fine.

Automaker Tesla collects image and video data from its Model S and Model X vehicles to help it improve its autonomous-vehicle capabilities. The company did ask users for permission to collect the anonymized data, for which it built a cloud-based data infrastructure to process and use. McKinsey & Co. predicts that car data monetization will be as high as $750 billion globally by 2030.

With such leaks from the largest holders of data, can anything good come from collecting it? Computer scientists say it can.


Ning Zhang
Ning Zhang

Using data for good

Ning Zhang, assistant professor of computer science & engineering in the McKelvey School of Engineering, says users should have contextual privacy to their data.

"Users should be able to control when they share, what they share, with whom they share and under what conditions," he said. "I think that's what's being guaranteed by our work, which is called PrivacyGuard. This context is the key to enabling people's will and to incentivize them to share.

Health care is one area in which big data has already been beneficial.

"Such transparency and confidentiality assurance will lead to a broader share of data to accomplish higher good to society.”

"In medical school, physicians learn theoretical analysis and rules," said Yixin Chen, professor of computer science & engineering. "Now we can collect a massive amount of data in the hospital and learn from it. With precision medicine, we can apply different doses and different treatment plans to different individuals. The future is to combine the strength of both ways."

Yixin Chen
Yixin Chen

In the United States, our medical data is protected by the Health Insurance Portability and Accountability Act (HIPAA), but even that is not failsafe. Most recently, the Centers for Medicare & Medicaid Services announced Nov. 15 that the data of nearly 94,000 patients had been compromised by an attack on the website healthcare.gov in October. In 2017 alone, there were 359 health care data breaches reported. Stolen medical records can sell for up to $1,000 on the dark web, a hidden network of websites making up about 3 percent of the internet that requires special access. Criminals often buy and sell stolen data via the dark web, which uses encryption to keep their identities and locations hidden.

Breaches like these make it difficult for researchers such as Chen to share data with researchers at other institutions. He is working on a project to predict liver fibrosis in patients, but needs more data.

"We are trying to collaborate with other institutions that have more data so we could combine them into one big database," he said. "We see that the performance of a model will increase as you have more data, but we are facing some obstacles because of regulatory issues that prevent them from sharing their data."

Sanmay Das
Sanmay Das

Sanmay Das, associate professor of computer science & engineering, is using data to better understand human behavior to improve society.

For example, he is collaborating with Patrick Fowler, associate professor in the Brown School, on a project that uses artificial intelligence to better match homelessness services to households at the point of entry into the homelessness system. The project uses data from calls to a homelessness hotline in St. Louis and demonstrates the potential for reductions in future homelessness through better targeting of interventions.

Das is director of the university's new interdisciplinary Division of Computational and Data Sciences (DCDS) doctoral program designed to train students interested in disciplines that apply data and computing to some of today's most important societal problems. In addition to the Department of Computer Science & Engineering, participants in the program include the Brown School, and the departments of political science and of psychological & brain sciences, both in Arts & Sciences.

“This ranges from vast amounts of data from neuroimaging studies to all the data collected through our online behavior and interactions. This goes well beyond online shopping and Facebook. For example, people trying to recover from opioid addiction may interact in online communities, giving us a unique lens on what kinds of strategies may work and how to support recovery.”

"The motivation for DCDS is partly that we're leaving digital traces and generating huge amounts of data about people and everything that we do," Das said.

Ethical concerns also will be in the curriculum, Das said.

"What is the good and the bad of using this kind of data?" Das asked. "Could we be really infringing on people's rights by having sentencing decisions being made on an algorithmic basis? I think to answer these questions you really need a transdisciplinary approach and need to understand all of these questions from the perspective of the people who are going to be designing, using and thinking about this kind algorithmic decision-making in society as a whole."

Yevgeniy Vorobeychik
Yevgeniy Vorobeychik

Can big data be controlled?

Short of discontinuing use of the internet, smart phones and social media altogether, individuals have a few options on how to protect their privacy, said Yevgeniy Vorobeychik, associate professor of computer science & engineering and an expert on security and privacy. Changing settings to determine who can see our activity is one way, but there is little we can do to prevent companies such as Facebook from using our data.

"Their entire business model is around getting highly detailed data about what you do on Facebook and off of Facebook and using that to help advertisers find you," Vorobeychik said. "That's how they make their money, and they're going to presumably fight tooth and nail to preserve their business model because of it."

Regulations similar to GDPR may not be the answer, Vorobeychik said. At the regulatory level, there has to be balance, he said.

"If you take GDPR to an extreme where companies have to ask permission to use every bit of your data and can't share it without your express permission, that could prevent good things from happening, for example, clinical research," Vorobeychik said. "Secondary analysis of clinical data could be valuable. Someone could use it and show that there's a particular drug that happens to be associated with preventing cancer. That is ambiguously a good thing that you could potentially prevent by having very strict data regulations."

As an individual, we can opt out of spam email, Vorobeychik said, but this assumes compliance on the part of the sender. But from a social perspective, what should be regulated and how?

"You may be OK with Facebook using your data, but you may be a lot less OK with Facebook selling your data to a health insurance provider who can subsequently use it to indirectly discriminate against you in some way," he said. "We have no idea if this is happening, because people are not transparent about what they are selling and to whom."

83% of enterprise workloads will be living in the cloud by 2020*
41% will be on public cloud platforms offered by companies such as Amazon, Google, IBM and Microsoft*

* Forbes 2018


The future

In November, an Amazon Echo smart speaker was called as a witness to a double homicide in New Hampshire. A state judge ordered Amazon to turn over the device's recordings, but Amazon has not yet determined whether it will comply. In a separate murder case in Arkansas in 2015, Amazon initially objected to police's request for Amazon Echo recordings but eventually conceded.

Cases such as this one raise even more questions about the future of big data, privacy, security, policy and ethics. One question still unanswered is who owns our data — does it belong to each individual, or does it belong to the holder? Sometimes the details are in the company's privacy agreements.

“Once your data is uploaded to a cloud, often the data is owned by whoever runs it,” Zhang said. “Oftentimes it’s not transparent to us, so transparency is also a key item in privacy. But if we are able to achieve transparency, which leads to broader share of data to accomplish higher society good, then there is this economic value drive that may be able to far offset the drop out of implementing the system.”

Das said the world is becoming more of an algorithmic decision-making society, as evidenced by China's dystopian Social Credit System, a mass surveillance network monitoring its 1.4 billion citizens and using it to assign individual trust scores to people and businesses. The system's 200 million closed-circuit cameras are connected to a facial-recognition system as well as financial, legal and medical records to make decisions about a person's or business' social credit.

Ultimately, making the most of big data will require balance.

"There is a sense in which we can try to understand how to make better policy by looking at existing data to see how current policy works and think there are counterfactuals about how policy would've worked on the goals that we're trying to achieve otherwise," Das said. "The question becomes how do we balance this with privacy, security and fairness. I'm bullish on the possibility for data to achieve good outcomes, but I'm a bit worried about the unfettered nature of it, especially when you've got powerful players who could really benefit by doing things that perhaps we as a society don't want to see happen."