Data scraping from public websites is legal. That’s the result of a ruling by the Ninth Circuit Court of Appeals earlier this week.
LinkedIn has lost its legal battle with data analytics firm hiQ, after it argued it was illegal for hiQ to “scrape” user profile data to analyze employee turnover rates under federal law. on Computer Fraud and Abuse (CFAA).
Tiffany Li, a technology lawyer and professor of law at the University of New Hampshire, joins “Marketplace Tech” host Meghan McCarty Carino to discuss the shape of the CFAA, which dates back to 1986, and how the law could be updated.
Tiffany Li: So CFAA is something that goes to the heart of a lot of cybersecurity allegations, a lot of allegations of data theft, website hacking, etc. What is important to us here is that the CFAA has a specific limitation that says you cannot intentionally access a computer without authorization, or exceed authorized access and then obtain information from that computer. The CFAA came into existence before what we consider, perhaps, the modern Internet. So maybe we didn’t have as many cases of these big websites full of millions of people’s data and these big commercial scraping companies that rely on that data. A lot of lawyers don’t like this law because it’s not necessarily super up-to-date. But that’s what we have.
Meghan McCartyCarino: In your mind, what should a CFAA update look like?
Li: I think there are two ways to update the CFAA. The first is in the text of the law. We could try to clarify specific terms like “access”. We can also clarify terms like “computer,” which is actually an interesting definition of law, because computers have really changed over the past few decades. We now need to look at systems like, for example, whether or not accessing the memory of a small mobile device counts as a computer, or whether or not storing a website counts as a computer. In the future, we may have to think about something very futuristic, things like determining if you have data embedded in DNA, for example. Or if you have different cloud applications, how would you define and delineate IT systems when something is highly distributed? So it’s a bit futuristic and extravagant, but I think eventually we’ll have to clarify what “computer” means.
McCarty Carino: So who could cheer for this decision? And why? I mean, I know journalists, for example, often use mined data for investigations and things like that. Who else might be affected?
Li:So people might be happy with the decision if they’re related to…say, freedom of information advocates, so people who care about being able to freely access information, being able to use freely data: artists, journalists, writers, people who are involved in libraries or archives. Some people might also be upset with the decision, as there could be privacy issues involved. Perhaps this decision does not protect the privacy of LinkedIn users.
McCarty Carino: For the average person who interacts with the internet, you know, who has a profile on LinkedIn, Facebook, social media, etc., what does this decision mean? Does this change at all the way we understand how we interact with these services?
Li: Your information may not be as private as you think. Anyone’s social media profile can be used by just about any business. And it could be someone with relatively good faith, you know, like hiQ probably is. Or it could be a company like Clearview AI. They have collected millions, if not billions, of face photos to use in facial recognition technology and we don’t know exactly who they are selling this technology to. But other than that, in general, there is no direct immediate impact. But we should consider these privacy issues.
Related Links: More From Meghan McCarty Carino
Tech Crunch points out that the judgment in the LinkedIn case is a notable victory for archivists and researchers. It is also useful for long-running projects to archive websites that have been taken offline, as well as efforts to use publicly available web data for academic research.
As Tiffany Li noted, the recovered data may have more controversial applications, such as its use by facial recognition software company Clearview AI, which The New York Times dug in 2020.
Clearview uses billions of photos scraped from Facebook, YouTube and dozens of other sites to train its facial recognition AI. According to information from The Times, it was used by more than 600 law enforcement agencies and several private companies for security purposes.
As for the Computer Fraud and Abuse Act – which Professor Li said could probably be updated – it appears in the current discourse on Netflix and its password sharing problem.
The practice could apparently be considered a federal crime under the CFAA. A decision from 2016 widely interpreted as a threat to streaming freeloaders found that sharing employee login information violated the law.
Numerous legal analyses, like this piece from cybersecurity professor Josephine Wolff at Slate, threw cold water on this alarmism. Wolff pointed out, however, that when it comes to the outdated CFAA, the legal questions almost never have clear answers.
So, streaming account moochers… you have been warned.