Scraping public pages is legal in the US (2024)
Disclaimer: SerpApi, including any of its employees or contractors, does not provide legal advice. The content of this blog post is for informational purposes only and should not be considered as legal counsel. For specific legal issues, consult with a qualified attorney.
You can legally scrape public pages in the United States
The legality of web scraping in the US has been the source of stress and confusion for many people or companies that could use web scraping to their advantage. While the United States doesn’t have clear laws surrounding the subject or a defining line describing the legality of web scrapers, we can use court cases and the precedents they set, as a guide. There are ways to legally and ethically navigate the world of web scraping, and looking at the most important statutes and cases concerning the subject is the perfect place to start.
Laws and Statutes to Consider
1. Computer Fraud and Abuse Act (CFAA)
The CFAA is oftentimes the culprit when it comes to web scraping cases. It seems to be the first law that the defendant allegedly violates. We will discuss it in relation to these cases. The CFAA was adopted in 1986 to address hacking. It discusses the idea of authorized vs unauthorized access to a protected computer or computer system. The CFAA makes knowingly accessing and obtaining information from a computer without authorization or by “exceeding authorized access” a criminal offense. However, the main problem with the CFAA is that these terms were not defined within the scope of the statute itself. Therefore, the interpretation of the CFAA now relies on precedent set by the Supreme Court, namely in Van Buren v. The United States (referenced later in this blog post.)
2. California Penal Code Section 502
California Penal Code Section 502 is often brought up in conjunction with the CFAA when it applies. This is because this penal code has similar implications to the CFAA. California Penal Code Section 502 describes the problems and punishments for accessing a computer, computer system or network without authorization. This includes not only accessing data without permission, but could also be altering data, assisting others in accessing data, or causing damage to a computer system or network.
3. Controlling the Assault of Non-Solicited Pornography and Marketing Act of 2003 (CAN-SPAM Act of 2003)
The CAN-SPAM Act of 2003 is not explicitly a web scraping law, but is referenced later in this article, so I would like to offer a brief explanation. For the purposes of this article, the CAN-SPAM Act of 2003 deals primarily with a large amount of commercial electronic mail being sent and whether or not the information is materially false or misleading. When the header or body of the text is materially false or misleading, or deceptive, the emails violate the CAN-SPAM Act of 2003.
Court Cases that defined legality of scraping of public pages
1. Craigslist Inc. v. 3taps Inc. et al (2013)
In this case, Craigslist v 3taps acknowledged that 3taps was able to scrape Craigslist as it was a publicly accessible website until Craigslist revoked authorization with their cease-and-desist letter.
The most applicable part of Craigslist v. 3taps is 3taps motion to dismiss Craigslist's complaints of violation of the CFAA and California Penal Code Section 502. 3taps was scraping data from Craigslist and marketing a Craigslist API. Craigslist wanted to stop their scraping, so Craigslist sent 3taps a cease and desist letter and blocked their IPs. The cease and desist letter even stated that they were no longer authorized to access their site and services “for any reason.” Instead of stopping their scraping activity, 3taps used new IPs and rotating proxies and continued scraping the site.
After 3taps ignored the cease and desist letter and circumvented Craigslist’s IP block, Craigslist sued 3taps citing many different claims, one of which was that they violated the CFAA. 3taps refuted this claim and filed a motion to dismiss this claim on the grounds that Craigslist is a public website. While this is true, Craigslist cited that the cease and desist letter was a clear revocation of authorization, and 3taps continued to “intentionally access(es) a computer without authorization.” 3taps claimed that being a public website, Craigslist cannot revoke authorization to access the site.
Upon further investigation upon the wording and meaning of the statute (CFAA) the court decided that Craigslist was able to revoke authorization and rested this on the precedent set by LVRC Holdings LLC v. Brekka, a case which dealt with a case of an employee accessing a computer and sending confidential information to himself to use in his own competing business. The court explained that “‘[t]he plain language of the statute . . . indicates that ‘authorization’ depends on actions taken by the employer.’” (Craigslist Inc. v. 3Taps inc, et al, 2013). In LVRC Holdings LLC v. Brekka, that was LVRC Holdings LLC and in Craigslist v. 3taps, it was Craigslist. Since Craigslist took the action of sending the cease and desist letter and blocking 3taps’ IPs, this was a valid way of revoking authorization, therefore 3taps violated the CFAA.
2. Facebook, Inc v. Power Ventures, Inc (2016)
At first glance, it may appear that Power Ventures was able to perform actions on Facebook’s computer because they had consent from the users, after Facebook sent a cease-and-desist letter, Power Ventures was no longer authorized to access Facebook’s data. However this case doesn't affect scraping public pages.
Facebook sued Power Ventures for accessing Facebook user data and sending form messages from Facebook’s platform for a promotion they were running. Power allows Facebook users to spread the word about Power through different Facebook actions, which in turn sends an email to their friends from Facebook or sends internal Facebook messages, depending on the user’s settings. After receiving a cease-and-desist letter, and an IP block, Power found a way around it, and continued their promotion, acknowledging the use of Facebook data with authorization. Facebook accused them of violating the Controlling the Assault of Non-Solicited Pornography and Marketing Act of 2003 (“CAN-SPAM”), the Computer Fraud and Abuse Act of 1986 (“CFAA”), and California Penal Code section 502. The district court originally gave a summary judgment on all three counts, but a later appeals court upheld some of these charges and reversed others. We will be mainly discussing the appeals case, as it is the most recent verdict on this case.
The first claim to address is the violation of the Controlling the Assault of Non-Solicited Pornography and Marketing Act of 2003 or the “CAN-SPAM” Act. Although this promotion caused over 60,000 emails and an undetermined number of internal messages to be sent, a large number of messages itself doesn’t violate this act. Rather, the question is whether or not the header or body of text is “materially false” or “materially misleading.” In this case, the external emails that are sent from Facebook are not materially false or misleading. Under the statute's definition, who initiated the message matters. Because of the course of actions, it could be Power and Facebook initiating the message, one of which is correctly identified in the header and text body. Additionally, Power users gave Power consent to share promotion through event invitations. For the internal messages, they were also not misleading as Power was identified, and Power users consented to the messages being sent. Therefore, the appeals court reversed the district court’s decision regarding the CAN-SPAM Act.
The next statute to consider is the CFAA. First, because of Power Ventures actions, Facebook suffered a loss of over $5,000 in employee time responding to this issue. Therefore, they have a private right to action under the CFAA for a loss this substantial. When it comes to accessing Facebook’s computers, at first it could be argued that Power was able to do so because Power users (also Facebook users) gave Power permission to access their Facebook information, therefore Power was not violating the CFAA. However, then Facebook sent the cease and desist letter that clearly revoked any authorization or permission they had to access Facebook computers. When they ignored the cease and desist letter and continued their campaign, Power Ventures violated the CFAA. The appeals court affirmed the district court’s decision in this case that Power Ventures violated the CFAA.Lastly, we must review the decision on the California Penal Code section 502. Section 502 addresses liability if a party “‘[k]nowingly accesses and without permission takes, copies, or makes use of any data from a computer…’” (Facebook Inc v. Power Ventures Inc, DBA and Steven Suraj Vachani, 2016) or computer network. The same explanation from the CFAA in the preceding paragraph applies to this case meaning the appeals court affirmed the district court’s decision in this case.
3. HiQ Labs, Inc v. LinkedIn Corporation (2022)
HiQ Labs v. LinkedIn case highlights that scraping data that is publicly accessible is legal and not in violation of the CFAA.
LinkedIn sent hiQ Labs a cease-and-desist letter for scraping public profiles on LinkedIn, and said that if they continue to do so, it will violate the Computer Fraud and Abuse Act (CFAA). They also said that they blocked hiQ Labs IPs. HiQ Labs then sought a preliminary injunction. It made it all the way to the U.S. Supreme Court (SCOTUS), but was sent back to the Ninth circuit court who, after re-examining the case, asserted their original claim, granting hiQ Labs the preliminary injunction. They deemed this injunction appropriate as the IP block significantly harmed their business, they did not violate the CFAA or other laws, and it was found that public interest was in favor of hiQ Labs. They asserted that hiQ Labs did not violate the CFAA because all of these LinkedIn profiles were publicly accessible. It’s also important to note that LinkedIn did not have ownership of this information. When it comes to the CFAA and web scraping, authorization is not required if the information is publicly accessible.
The idea that no authorization is required for publicly accessible data under the CFAA is in reference to a precedent set by the SCOTUS case Van Buren v. The United States. In Van Buren v. The United States, the CFAA is clarified in reference to the “exceeds authorized access” clause. This provision is in reference to files that one may not have access to; if they were to obtain said information, it would “exceed authorized access” and violate the CFAA.
Van Buren v. The United States also brings up a gates-up-or-down inquiry, which is applicable to this case. This inquiry discusses authorized. If authorization is required and has been given, the gates are up, granting access. If authorization is required and has not been given, gates are down, blocking access. However, in this case, as public websites don’t limit access, authorization is not required, so there is no gate to operate, rendering the authorization not applicable (Van Buren v United States, 2021).
4. Meta Platforms, Inc v. Bright Data Ltd. (2024)
Meta Platforms v Bright Data case exemplifies that logged-off scraping of publicly accessible data is legal and did not violate Meta’s Terms of Service.
Meta v. Bright Data was an interesting web-scraping case that Meta brought against Bright Data. It was unique because while Bright Data was motioning for summary judgment, Meta was cross-motioning for partial summary judgment for breach of contract and tortious interference with contract.
Bright Data collects and sells datasets through web scraping and offers the use of proxies to their customers as well. Two of the sites scraped were Facebook and Instagram, 2 social media sites, both of which are owned by Meta. Meta alleges that scraping their sites and selling the information is against both Facebook and Instagram policies and that BrightData refused to stop their scraping activity when Meta demanded them to.
One of the main issues in this case rests on the term “user” and, consequently, “use.” Meta alleges that anyone who accesses their site “uses” their data under their contract, while Bright Data alleged that a “user” would have to be someone who has an account and that until that account is created, they wouldn’t be bound by the terms. It is also important to note that Bright Data had Facebook and Instagram accounts used for marketing purposes when the original lawsuit was brought up. However, as of December 4th, 2023 (about two months before this decision), Bright Data terminated all Facebook and Instagram accounts. Aside from the accounts that they had, they claim that all information was from logged-off scraping, meaning that all data scraped was public and not protected by a password. They say that because of this, it had nothing to do with their marketing accounts and they were therefore not bound by the terms and conditions as they were not “users.” This is further supported by a change in Facebook’s terms and conditions. In 2009, there was a clause that said: “‘accessing or using our website . . . signif[ies] that you . . . agree to be bound by these Terms . . ., whether or not you are a registered member of Facebook.’”(Meta Platforms, Inc. v. Bright Data Ltd., 2024) This clause has since been deleted. This clause clarifies that Bright Data would be bound in 2009. However, because this clause has since been deleted, Meta seems to have knowingly changed their terms to exclude those who access the website, or visitors, as users.
Additionally, Bright Data holds that logged-off scraping couldn’t be bound to the terms and conditions as these terms are accessible only when you’re signing up for an account and are only hyperlinked. Therefore, by being on the publicly accessible site, you would neither come across the terms and conditions, nor be asked to agree to them. Meta also contends that Bright Data circumvented their access restrictions by various methods including the use of technology for CAPTCHA solving. However, the use of automation for CAPTCHA solving has been proven different from accessing password protected content by this court in the earlier case HiQ Labs v. LinkedIn. Despite Meta’s best efforts, they were unable to meet their burden of proof that Bright Data scraped logged-in data.
Upon examination of the terms, there was some ambiguity and because of this, both Meta and Bright Data were allowed to submit extrinsic evidence. Meta chose to not submit extrinsic evidence saying their terms were not ambiguous. In contrast, Bright Data referenced the 2009 change in terms, a 2009 statement by a Facebook employee that stated “‘A putative Facebook user cannot become an actual Facebook user unless and until they have clicked through the registration page where they acknowledge they have read and agreed to Facebook's terms of use’”(Meta Platforms, Inc. v. Bright Data Ltd., 2024)., and a quote from the help center saying “‘You have to create a Facebook account in order to use Facebook.’” (Meta Platforms, Inc. v. Bright Data Ltd., 2024)
After examination of the terms of both Facebook and Instagram, the course of events between Meta and Bright Data, extrinsic evidence, and that logged-out scraping is not prohibited by the terms and conditions, Meta’s motion for partial summary judgment was denied. Meanwhile, as logged-off scraping was not prohibited by the terms the court rules in favor of Bright Data and grants their summary judgment on breach of contract. After this, the claim of tortious interference against Bright Data still stood. However, Meta has since dropped the case.
Legality of Web Scraping at SerpApi
Here at SerpApi, we only perform web scraping on search engines providing publicly available data. The API calls replicate real-time searches with no login or authorization required. In this context, SerpApi’s web scraping is completely legal as shown by historical cases mentioned above, such as hiQLabs v. LinkedIn.
We are committed to maintaining legal and ethical web scraping practices and providing our customers with peace of mind. This is one reason that for all plans from the production plan and upwards we offer our Legal US Shield. Although web scraping is legal in the US, we provide this coverage in case there were ever any issues to arise. Our Legal US Shield would cover any legal questions regarding only the scraping and parsing of data, not the use of said data. We continue to offer this protection to ease any concerns our customers may have in regards to web scraping.
Sources
California Penal Code section 502. (n.d.-b). https://www.calpers.ca.gov/docs/ca-penal-code-502.pdf
Computer fraud and abuse act (CFAA). NACDL. (n.d.). https://www.nacdl.org/Landing/ComputerFraudandAbuseAct
Craigslist, Inc., v 3 Taps Inc. et al., law.justia.com (United States District Court for the Northern District of California August 16, 2013). Retrieved April 18, 2024, from https://law.justia.com/cases/federal/district-courts/california/candce/3:2012cv03816/257395/101/.
Craigslist, Inc v. 3Taps, Inc et al, no. 3:2012CV03816 - document 101 (N.D. Cal. 2013). Justia Law. (n.d.). https://law.justia.com/cases/federal/district-courts/california/candce/3:2012cv03816/257395/101/
Dilmegani, C. (2024, January 5). Is web scraping legal? ethical web scraping guide in 2024. AIMultiple. https://research.aimultiple.com/web-scraping-ethics/
Facebook, Inc. v Power Ventures, Inc., DBA. (n.d.-b). https://cdn.ca9.uscourts.gov/datastore/opinions/2016/07/12/13-17102.pdf
Facebook, Inc. v Power Ventures, Inc, DBA and Steven Suraj Vachani, cdn.ca.uscourts.gov (United States Court of Appeals for the Ninth Circuit July 12, 2016). Retrieved April 18, 2024, from https://cdn.ca9.uscourts.gov/datastore/opinions/2016/07/12/13-17102.pdf.
HiQ Labs, Inc. v. LinkedIn Corp. (n.d.-b). https://cdn.ca9.uscourts.gov/datastore/opinions/2022/04/18/17-16783.pdf
HiQ Labs, Inc. v. LinkedIn Corporation, cdn.ca9.uscourts.gov (United States Court of Appeals for the Ninth Circuit April 18, 2022). Retrieved April 18, 2024, from https://cdn.ca9.uscourts.gov/datastore/opinions/2022/04/18/17-16783.pdf.
Legal Information Institute. (n.d.). 18 U.S. Code § 1030 - fraud and related activity in connection with computers. Legal Information Institute. https://www.law.cornell.edu/uscode/text/18/1030
Lim, J. (2024, March 4). This is why Meta lost the scraping legal battle to Bright Data. Proxycurl Blog | Read our stories on data, scraping, APIs.
https://nubela.co/blog/meta-lost-the-scraping-legal-battle-to-bright-data/
Meta Platforms, Inc. v. Bright Data, Ltd., Casetext.com (United States District Court, Northern District of California January 23, 2024). Retrieved April 18, 2024, from https://casetext.com/case/meta-platforms-inc-v-bright-data-ltd-6.
Meta Platforms, inc. v. Bright Data Ltd., 23-CV-00077-EMC | casetext search + citator. Casetext.com. (n.d.). https://casetext.com/case/meta-platforms-inc-v-bright-data-ltd-6
19-783 van buren v. United States (06/03/2021). (n.d.). https://www.supremecourt.gov/opinions/20pdf/19-783_k53l.pdf
Public law 108–187 108th Congress an act - govinfo.gov. govinfo.gov. (n.d.). https://www.govinfo.gov/content/pkg/PLAW-108publ187/pdf/PLAW-108publ187.pdf
Quinnemanuel. (2023, April 28). The legal landscape of web scraping. Quinn Emanuel Trial Lawyers - Quinn Emanuel Urquhart & Sullivan, LLP. https://www.quinnemanuel.com/the-firm/publications/the-legal-landscape-of-web-scraping/
Urban, O. (2024, March 7). Is web scraping legal? . Apify Blog. https://blog.apify.com/is-web-scraping-legal/#what-is-personal-data-information-anyway
Van Buren v United States (Supreme Court of the United States June 3, 2021).
Whittaker, Z. (2022, April 18). Web scraping is legal, US appeals court reaffirms. TechCrunch. https://techcrunch.com/2022/04/18/web-scraping-legal-court/