Algorithmic Content Moderation and Copyright Law

by Abhimanyu Agarwal

Introduction

The objective of this paper is to focus on how to deal with copyright infringement with respect to algorithmic content moderation. The paper is structured in a way to explain content moderation and the problems of copyright infringement that are associated with algorithmic content moderation. Moving further, this paper is an attempt to find an answer as to the process that should be used for making legislation governing online content moderation to protect copyright infringement, taking assistance from some prevalent case laws, tests and legal provisions.

1. What is content moderation: A look at algorithmic content moderation and its functioning

Content moderation is the key to user-generated content. It is one of the many ways to maintain a positive online reputation and a good relationship with customers. James Grimmelmann (Professor at Cornell law school, teaches about regulating content platforms) had laid out a broad definition of content moderation as “the governance mechanisms that structure participation in a community to facilitate co-operation and prevent abuse”^[1]. Moderations allow the administrators of a particular site to remove content or exclude users and design rules of engagement that organise the way members communicate with each other. Algorithmic moderation can be defined as systems that classify user-generated content, which is based on either matching or prediction of community guidelines of public platforms, that leads to a decision and an outcome which may result in removal or takedown of account. A great example of algorithmic content moderation is Wikipedia, as it allows contributors to write a page and to keep the content in check. The content provided for in the articles is provided from all over the globe, having such content from a variety of sources, there’s often a chance that the information provided for might turn out to be false. For ensuring correctness, it has human moderators for ensuring the presence of truth, taken from various sources, while still leaving room for the subjectivity of truth. It also has “automated bot” moderators for monitoring the articles in cases of nonsensical things, which remove certain terms that a human moderator may input for the “bot” (I.e. removing terms such as “F*** you” in case of an esteemed personality, where the human can automatically enter words, which the bot may remove. This can be done for particular articles.)

The importance of having content moderation can be seen from the Christchurch incident in 2019, when a man strapped a camera to his chest and began a Facebook live stream while entering into a mosque, where he murdered more than fifty people with an assault rifle. It subsequently became the highest-profile test for members of an organisation called the Global Internet Forum to Counter Terrorism (“GIFCT”). Facebook, Google, Twitter and Microsoft created the GIFCT for combating illegal online hate speech, and as a part of a commitment to increase industry collaboration as mentioned in the European Commission’s code of conduct.^[2] The GIFCT uses a “hash database_^[3]” which, checks for terrorist content and digital fingerprints of illicit content (images, video, audio, etc.) are checked. If an item matches the information in the database after a hash (aka checking mechanism), it would be blocked.

Platform companies, such as YouTube, Facebook, etc., have moved from community moderation^[4] towards what is termed as commercial content moderation or platform moderation. Thus, where previously bulletin boards and forums were managed by dedicated administrators, the same is now managed by an artificial interface which particularly checks for certain keywords and kinds of content that may be banned from a community platform. For instance, people leaving a cruel or insensitive comment on Facebook may end up facing a “cruelty checkpoint”.^[5]This means that a moderator or a tool may ask them to consider removing it, otherwise their accounts could get closed if they continue persisting. Thus, we see that algorithmic content moderation involves a range of techniques from statistics to computer science^[6]. Moderation can range from primitive type, where it is only restricted to certain keywords moderation, while other algorithms are able to identify a piece of content on the basis of its properties, which may be a bit complex and earlier required application of the human mind.

2. Algorithmic content moderation and copyright infringement

In the aftermath of Napster debacle^[7] and the multitude of file sharing controversies in the early 2000s that coincided with content hosting and sharing platforms, the need for having a copyright license on content became important. Premier among them was YouTube, as industry lobbies and other right holders wanted to curb the unlicensed distribution of their content. Copyright law allows third parties to create excerpts or use protected content through “fair use standards”, which may vary across jurisdictions. By fair use standards, certain important exceptions are created for use of copyrighted content in educational purposes, parody, etc which will not result in copyright violation. To enunciate further, especially in the U.S. works of commentary, criticism, research, teaching or news reporting can be considered as fair use depending on the situation. The key point is that the idea of author isn’t exactly copied or manipulated to such an extent that it goes against what the writer initially stated (this holds true particularly for commentaries). The problem, however, is that it is not possible to program contextual factors for assessing fair use into automated systems. The same process still requires human oversight. The constantly changing nature of laws and what may fall under fair use ends up becoming an arduous process, especially taking into account the amount of content that is available and accessible on these platforms. This can also result in over-blocking of content on social media platforms, as right-holders blocking usually tend to have greater resources and power. Big players like Facebook and Twitter, have appeared in recent controversies regarding content moderation.^[8]

The remedy procedures provided for users that wish to challenge takedowns that are provided by YouTube and other platforms of a similar ilk are slow and resource-intensive for the challengers, who are already at a major disadvantage as compared to right-holders. Users that have their content removed or de-monetised, even in cases of fair use, end up requiring months of legal help and expert knowledge of copyright law for achieving a successful fair use claim^[9]. Despite this large scale of takedowns in recent times, pressure from regulatory bodies and stakeholders on platforms have incentivised them to provide an easier path towards takedown than through ex-ante investigations. However, there might be cases that could benefit from copyright exemptions, a reference can be made to musicians uploading their covers and then comparing it to the original piece of art, where music houses strictly impinge copyright, in order to cover up certain fallacies in the original piece or certain other artists that use music for ambience while their actual aim through videos might be to showcase their makeup skills^[10]. The imbalance (between identifying content that should be copyrighted due to it being blatantly used without substance, and other forms of content) would be enshrined in legislations like the European Union Copyright Directive. The same would affect the monitoring obligations from platforms for content uploaded by users and would lead to greater deployment of matching systems^[11] at the point of upload. It prejudices content creators as falling under the lacunae of the term could end up rendering their content de-monetised.

A key decision in the question of fair use on platforms was Lenz v. Universal Music Corp.^[12] In this case, decided upon by the United States Court of Appeals for the Ninth Circuit, a woman posted a clip of her baby dancing on YouTube. The clip had a song by a popular artist playing distantly in the background. The recording label detected this unauthorised use of the music and subsequent demand for its removal was made under the Digital Millennium Copyright Act (“DMCA”). The legal question to look at was whether Universal (the recording label) had a “good faith belief” that the clip was an infringement on fair use standards before making a demand for its removal. The court in this case held that the consideration of fair use was required before the removal of the online content. However, keeping the use of automated detection in mind, the court further stated,

“We note, without passing judgement, that the implementation of computer algorithms appears to be valid and in good faith middle ground for processing a plethora of content while still meeting the DMCA’s requirements to somehow consider fair use.”

3. The problems while formulating an effective legislation to deal with copyright infringement online

Surprisingly or rather unsurprisingly, depending on your perspective, the court later withdrew the aforementioned particular passage of dicta from the published opinion. This, raises a particularly pertinent question: Whether old-style legal personalisation can be translated into data-driven, machine-mediated personalisation, with respect to online content?

With reference to the article Algorithmic Fair Use’ by Dan L. Burk^[13], it should be noted that fair use has a disadvantage of ex-ante uncertainty. The four factors that guide the term fair use are: how much of the work was taken; what was done with it; what kind of work was taken; and its effect on the likely market for that work.^[14] The problem is that no one knows how the court will apply these factors. The uncertainty can be seen from the aforementioned judgement. There are various nuances to be taken into account while dealing with a topic such as fair use, and the checkbox-based community guidelines that most platforms incorporate are not sufficient to adapt and apply them judiciously. The basic outcome of this conundrum would see risk-averse content users being unable to confidently predict the outcome of decisions with respect to these activities and thus end up forgoing existing benefits.

We should realise that fair use is not a static concept, so another question that is raised in lieu of that is: Whether an automated instantiation of fair use freezes the standard of time at the time it is encoded. To answer this question, it must be stated at the outset that algorithms can be continuously updated. This means it is a double-edged sword as instead of promoting convergence between the law and the algorithm, it can lead to severe divergences. This is a source of inconsistency as judicial determinations based on fair use, usually use similar standards from previous judgements^[15]. While coming to an understanding as to how to decide these cases, it also needs to be kept in mind that the idea to promote GIFCT is given paramount importance. Therefore, it can result in a situation where the right answer is in front of us, but it isn’t applicable as there are different standards and different players in the online sphere. While there can be a set jurisprudence on how to accommodate fair use standards keeping in line with community guidelines, the same isn’t available right now due to a lack of efficient legislation on how to proceed on online forums. Further in this article is an elaboration of recommendations for laying down the ways to formulate a standard law to deal with the same.

4. A look at America’s “safe harbour” provisions for online copyright regulation

Whilst looking at laws pertaining to copyright, it is also essential to look at the United States, Online Copyright Infringement Liability Limitation Act, (“Act”) codified at Section 512 of Title 17 – Copyrights – of the United States Code . The Act offers online service providers ’immunity’ from copyright infringement for user-uploaded content. Thus, showcasing a key facet that we hadn’t previously focused on in the article, which is the immunity for the service providers from copyright infringement and not just the users. It should be noted that this is no blanket immunity, but rather it comes with exceptions. However, there is an onus on the provider to remove material that’s subjected to a notice of infringement.^[16] The safety accorded due to the section is suspended when providers having knowledge of infringing activity do not take steps for its prevention or have both a “financial benefit directly attributable to” the infringement and the “right and ability to control” it^[17]. Moderators duties herein, are restricted as they aren’t required to “affirmatively seek facts” indicating the infringing activity.^[18] Section 512 also specifies particular moderation strategies that a provider must use, including ex-post deletion at the time it receives notices or knowledge of infringement and ex-post exclusion against repeat infringers. The importance of the previously mentioned financial benefit test is that since it is a restriction on pricing, it rules out those moderation models that are found to be “directly” linked with infringing material i.e. the moderator has a direct connection to the infringing material by either being the infringer themselves, knowing the infringer or so on. Thus, it recognises the fact that there would be moderators who would allow infringement for their personal needs or gains. Also, the choice of ex-post over ex-ante moderation helps in simplifying the moderation task, as the former model has specific events which trigger the requisite actions to be taken thus following a more systematic approach of criteria fulfilment. YouTube is the one entity that has actually invented algorithms based on Section 512 ex-ante ex-post distinction, which in effect is its Content ID^[19] blocking uploaded videos which match with an extensive list of copyrighted works. The problem that a court would have in such situations is deciding whether or not its matching algorithms were too aggressive in such a case. However, it cannot be denied that the general principle being applied in algorithms is actually far more reasonable than the earlier brute content algorithm moderators.

5. How to make the law while also looking from the perspective of the internet service providers?

While making a generalised law for various online community platforms, there are various questions that the lawmakers try to answer. For instance, should platforms such as Facebook, Instagram and tinder be considered competitors? The answer to this question leads to a divergence in opinion, as the relevant market for the three is actually very different. This is because the relevant good/service herein, differ from providing an avenue to upload photos and opinions to find a match for dating. Similarly, with respect to copyright infringement, classification of the infringing material itself also deals with the reasonable person standard with respect to the different platforms. This can lead the policymakers to avoid drawing a common mechanism for solving the classification problem, rather induce the platforms to solve this problem themselves. The blanket law that the U.S deems fit in such a situation is Section 230 of the Communications Decency Act, which protects “internet service providers” from derivative civil liabilities for the acts of their users. This is inclusive of laws such as defamation laws, civil rights laws, consumer protection law and copyright infringement, among others.

A leading case that helped in laying out tests for the removal of sexually explicit, violence-inducing and copyright infringing content was, Miller vs. California^[20]. The Miller test, as follows, provides for a reasonable way of laying down legislation in the future:

“The basic guidelines for the trier of fact must be : (a) whether “the average person applying contemporary community standards” would find that the work, taken as a whole, appeals to the prurient interest, (b) Whether the work depicts or describes, in a patently offensive way, sexual conduct specifically defined by the applicable state law; and (c) whether the work, taken as a whole, lacks serious literary, artistic, political or scientific value.”^[21]

The same can be used as a criterion by moderators to decipher copyright infringement, among other offences, but for algorithms the process shall be different. Most algorithms used in content moderation are “black boxes” i.e. their results can be evaluated, however the internal process by which the results are arrived at, are incomprehensible to humans. The algorithm converts the individual inputs (the content) into outputs (the result of whether the content is prohibited or not). In case, the output is undesirable, then the moderator or the person fixing the algorithm, tweaks some algorithm and the process is tried again. What the focus should be for algorithm correcting agencies and moderators, is to operate in a “quality control” mode, which means feed pre-existing “problem sets” into algorithms. They can maintain stock algorithms for problems such as sorting threatening language. This is something that is being done but the change that could take place is central government oversight in content moderation. The harm this approach may bring would be an arbitrary exercise of power; to combat this the central governments of different countries shall set up a tribunal^[22]. The tribunal should consist of a legal scholar pertaining to the field of which the issue is at hand, a judge and a person representing the central government to try and ensure a reasonable exercise of such power. To conclude it must be noted that the aforementioned contentions will go a long way in ensuring the moving forward of an efficient legislation.

^[1] James Grimmelmann, “The Virtues of Moderation,” 17 Yale Journal of Law & Technology 42 (2015)

^[2] Gorwa R, Binns R, and Katzenbach C, ‘Algorithmic Content Moderation: Technical and Political Challenges in The Automation of Platform Governance’ (2020) 7 Big Data & Society

^[3]https://searchsqlserver.techtarget.com/definition/hashing#:~:text=Hashing%20is%20used%20to%20index%20and%20retrieve%20items,It%20is%20also%20used%20in%20many%20encryption%20algorithms

^[4] Lampe C and Resnick P (2004) Slash (dot) and burn: Distributed moderation in a large online conversation space. In: Proceedings of the SIGCHI conference on Human factors in computing systems, 2004, pp. 543–550.New York, NY: ACM.

^[5] Langvardt K, ‘Regulating Online Content Moderation’ [2017] SSRN Electronic Journal

^[6]Gorwa. R, Binns. R, Katzenbach. C Algorithmic content moderation: Technical and political challenges in the automation of platform governance https://journals.sagepub.com/doi/full/10.1177/2053951719897945

^[7] http://news.bbc.co.uk/2/hi/entertainment/1436796.stm

^[8] https://timesofindia.indiatimes.com/blogs/toi-edit-page/the-social-media-crisis-why-facebook-and-other-platforms-must-be-made-liable-for-their-content/

^[9] Gorwa R, Binns R, and Katzenbach C, ‘Algorithmic Content Moderation: Technical and Political Challenges in The Automation of Platform Governance’ (2020) 7 Big Data & Society

^[10] https://www.engadget.com/2014-07-23-youtube-star-lawsuit.html

^[11] https://link.springer.com/article/10.1007/s00146-014-0549-4

_^[12] 815 F3d 1145 (9^th Cir 2016)

^[13] Burk D, ‘Algorithmic Fair Use’ (Papers.ssrn.com, 2020) <https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3076139> accessed 26 May 2020

^[14] ibid

^[15] ibid

^[16] 17 U.SC. S 512 (c)(1)(C)

^[17] Id. S 512(c)(1)(B)

^[18] Id. S 512(m)(1)

^[19] https://support.google.com/youtube/answer/2797370?hl=en

^[20] 413 U.S. 15,24

^[21] ibid