The role of photo and video in exposing human rights abuses is undeniable. From evidence of Nazi war crimes at the Nuremberg Trials, footage of Rodney King’s beating by the Los Angeles Police Department in the US in 1991, Youtube’s witnessing of the Arab Spring and, more recently, TikTok and other social media platforms becoming repositories of atrocities from China to Ukraine; photo and video captured by accidental witnesses, civil society actors or journalists have tremendous potential to surface the truth and effect change. At the same time, the growth of video in human rights advocacy has been accompanied by the development of increasingly sophisticated image generation and manipulation tools. In the last year, leaps in artificial intelligence (AI) have brought us powerful software like Dall-E, Midjourney and Stable Diffusion. Generative AI allows anyone with access to an internet connection to create realistic images from text prompts, extend an image beyond its original borders to alter the context in which it is to be interpreted, or insert new content into a picture.
While new technological innovations make it easier to document war crimes, police brutality or violence in protests, they also make it easier to manipulate media. This is particularly significant given how in times of upheaval often the first line of defense is to discredit legitimate documentation by claiming uncertainty about the authenticity of visual media, putting the burden on human rights defenders to prove that their content is genuine. This phenomenon is directly related to the “liar’s dividend”, and surfaces a powerful and urgent need in human rights documentation: if we cannot stop malicious actors from deflecting and obscuring reality, or from benefiting from the liar's dividend, we should focus instead on fortifying the truth coming from the human rights defenders countering those lies.
Development of pioneering technologies like Proofmode and eyeWitness began over ten years ago by the human rights sector, offering options to track the provenance of a piece of media and help prove its integrity. While these tools were intended for specialized and niche uses, more recent and ambitious projects are underway by the private sector. These have been further galvanized by the collective and urgent demands to address online mis- and disinformation. The most notable effort is the Coalition for Content Provenance and Authenticity (also known as C2PA), led by the BBC, Microsoft, Adobe, Intel, Twitter, TruePic, Sony and Arm. This Coalition released in January 2022 a set of open technical standards that aim at making it easier to identify how, where and by whom a piece of media may have been created, and the modifications it may have undergone while disseminated–whether it is by media outlets or on social media feeds. Companies are now starting to create tools based on these underlying standards.
In light of some of these technical and social provocations that generative AI brings, how can technologies for authenticating photo and video cope with human rights needs?
Calls for initiatives that can empower communities to capture trustworthy content are gaining new momentum. With the increasing diversity of actors working on different specifications and tools have come different interpretations of what may be important for communities at the frontlines of critical situations like elections, cultural heritage protection, public health, land defense or armed conflicts. To bring some clarity to this complex field and the urgent questions these issues raise, this article identifies and debunks the most common misconceptions concerning the design, development and adoption of trust and provenance tools. It then builds on this analysis to provide guidance for developers, investors, funders and other professionals interested in building or implementing authenticity technologies. Purposely, the article is focused on addressing specific issues that are dominant in current narratives on provenance and authenticity technologies: trust, truth, and immutability and permanence. As such, the paper does not cover the full array of risks and harms that these tools may bring, as this has been covered extensively in the C2PA harms modeling and related works about their dilemmas.
One of the most common fallacies is that collecting as much metadata as possible serves to authenticate visual content. In other words, the more information acquired about a video, the more we can trust its authenticity. This is not necessarily true. Incorrect and inconsistent metadata may also undermine the trustworthiness of the footage. More importantly, it can bring risks to the person behind the lens.
For instance, knowing the exact date, time and location in which an event took place is one of the first steps to verify a piece of content. Consequently, it is also one of the first metadata points that provenance technologies seek to collect. A tool can capture date, time and location data from many sources, such as hardware sensors like GPS or WiFi, or a combination of them, manual input by the user, an internet look up, or the memory of the device. Any given software may have a different set of instructions for how to capture this metadata. For example, some software tools may prioritize GPS sensors over other collection means, or command the tool to fetch the data from the device’s memory when other preferred capture methods are not available.
In many situations where social tensions are escalating, governments may put in place internet shutdowns to control the flows of information. While a device does not need connectivity to collect the position on the basis of GPS radio signals, if a software is giving instructions to additionally collect data from the device memory, the tool could be embedding geolocation data from the last known location too. Not only this may reveal the home address or the safe location of an individual already at risk, but from an authentication perspective this inaccurate metadata can insert doubts about where the footage was really recorded.
Less fatal from a security point of view, but equally undermining to trustworthiness, is the over-collection of data points that may be inconsistent across device makes and models, such as altitude, atmospheric pressure, or acceleration information. While it can be attractive to capture this information to corroborate the circumstances in which a video has been recorded, this metadata often has significant hardware variations. Research tells us that we tend to place more trust on data or results that are machine-generated in detriment of evidence provided by humans. In critical human rights situations where mis- and disinformation are rampant, authenticity technologies may unintendedly contribute to the “firehose of falsehood” by producing content with confusing metadata. For provenance and authenticity technologies, no data is better than inaccurate data.
A general misconception connected to the collection of provenance data is that fully disclosing this information can lead to deterministic assertions about the truthfulness of what’s in the frame. However, content that is “staged” can still have provenance information if it has been recorded in a situation that mimics a real event. Actors could enact a protest turned violent in a certain neighborhood, and–as long as the footage is circulated claiming, for instance, that the event took place in the location, date and time it was recorded–the provenance data may not contradict those claims. Similarly, even if a piece of media depicts a real event, the content on the frame of a screen will always be a curation of reality. A video showing a soldier firing at an opposing military force may omit other crucial events that led to the confrontation, making it difficult to assess the offensive or defensive nature of the attack.
While provenance data can help verify a piece of media, ascertaining the truth still requires further analysis of the actual content. Metadata provides important information but does not tell the truth of a piece of media. Yet, some tools may build on this information in order to deliver binary results about the truthfulness of the content. This can have catastrophic consequences for communities who may have captured footage at high risk, and where authenticity and integrity may be on the line. For example, authentic content may be visibly stamped as “fake” because it is missing certain data points that a model needs in order to perform a technical evaluation.
Even if these tools publish the verification process–leaving it to the person receiving the information to arrive at their own final conclusion–these models can be complicated to interpret. Generative AI software like Midjourney v5 can create “staged” images that look photorealistic. With synthetic media becoming more widely circulated online, we should consider carefully how authenticity and provenance technologies visually label or make accessible the origin and edits of the content we see. This is why these standards and tools should be accompanied by non-technical solutions that also prioritize equal access to the knowledge and literacy needed to interpret confusing signals of truth.
It is not uncommon to see efforts being put into developing radical permanence and immutability, for instance by writing records into blockchain or blocking changes to digital objects. An immutable record may not allow for instance, to blur someone’s face whose identity may need to be protected. A permanent record cannot be deleted, whereas immutable ones can. These approaches may strip away the agency of both the documenter and the person portrayed in the footage. Hence, in contexts where the stakes are high and the arc of history is long, solutions built on immutability and permanence may not bring more trust to certain communities.
Civic journalists, activists and others documenting human rights abuses may record footage for different purposes. Among other uses, they may want to upload it to social media sites to raise global awareness, distribute it privately to policy makers or governments to influence their actions, submit it anonymously to a media outlet for further dissemination, or capture it as a potential evidence for legal proceedings. The lifecycle of a piece of content is hence multifaceted, and it may evolve over time. Evidence disclosed in the context of civil or criminal litigation may make its way to the public consciousness when it is memorialized in a museum, and content buried by algorithmic moderation may get amplified when major news outlets surface it. Communities may also want to safeguard this content by creating their own archives, independent from centralized infrastructures that often fail to take into account their needs, priorities and requirements.
However, it remains difficult for both documenters and the people in front of the lens to meaningfully consent to the myriad possible lives that a piece of footage may have. In some instances, those doing the documentation and those holding the content and deciding its use may overlap. Yet, a look at the information ecosystem in the examples outlined above demonstrates how often multiple different actors become involved in the lifecycle of a piece of content. They may include the activist initiating the recording; the media outlet broadcasting the content; the social media platform publishing it on its wall; the archivists making decisions over selection, preservation and access; the policymakers receiving the material as proof of much needed change; the lawyers and judges tendering it as evidence in court; the people whose image is being circulated; and the heirs to their digital remains.
We see in the examples above that the lifespan of a piece of content is interwoven with the purposes and decisions to pull out a phone and press record, and the successive uses that different actors may give to the footage. In many occasions, purpose and use will be aligned, but the further we move into the future uses of an image, the wider the gap may be with the documenter’s original intentions, and the more difficult it becomes to predict the potential for misuse of such footage. This is why it is crucial that these technologies are developed in close consultation with the communities that will be most affected by the design and adoption of tools that have the potential to create permanent snapshots of history. Their input should come not only at the design phase, but also during its conceptualization, coding, and testing prior release, requiring as well continued collaboration to address the impact of its deployment and any following iterations.
This co-creation process is the transparency that authenticity technologies need. Provenance tracking tools must always be opt-in, not only at the moment of their adoption but also with respect to uses of the information they capture. It is important to consider how immutable and permanent records of truth can become tools for coercion. That is why immutability and permanence may work against trust-building in human rights contexts. Instead, we should build technology design, development and production processes that are opt-in, and that can evolve with the demands that different communities may have over time. The best way to achieve this is by consulting with the anticipated data collectors, recipients and subjects–centering those communities that may be harmed the most by these technologies.
Guidance for investors, funders, developers and implementers
The need to combat visual mis- and disinformation in relation to human rights abuses has become a pressing priority among documenters and activists, and communities who may be at the receiving end of footage captured in the frontlines of armed conflict, land defense, police violence, military overreach or electoral chaos. From raising awareness of environmental abuse in communities in the Amazon, to creating a historical record of more than a decade of the war in Syria, people want their media to be trusted and believed. The following guidance can help designers and developers navigate the myths and realities of building provenance and authenticity technologies.
Develop use case scenarios that take into account the context, needs and requirements of the data collectors (e.g. activists, civic journalists, human rights defenders) and the recipients or consumers of the data (e.g. general public, social media users, media outlets, legal professionals), as well as the data subjects (i.e. who may be in the frame of the lens).
Consider the balance between anonymization, verifiability of the data, and user experience depending on the above set of requirements and use case scenarios.
In consultation with the data collectors, and to the extent possible with potential data subjects too, conduct a risk assessment that includes physical, psychological and digital threats, as well as the likely future uses and unintended consequences when the footage is used for different purposes to the envisioned ones.
Develop written specifications and wireframes in consultation with data collectors and consumers, digital security professionals, media forensics experts, archivists and lawyers (even if the intention of capturing the information is for other purposes that are not solely evidence in court). These specifications should address mitigation strategies to the threats identified in the risk assessment; the potential relevance of the information that will be captured by the tool in question; what metadata adds verification and authentication value and what may be detrimental; what information may be sensitive, on its own or when combine with other data or sources; and legal risks (e.g. subpoenaed actions), that may generate other risks, such as physical or reputational.
Consider operational policies that can impact the design process, such as the resilience of the systems to maintain different file formats over the long term, or sunsetting process and policies. If the tool seeks to be part of the broader provenance and authenticity ecosystem being promoted by the C2PA technical specifications, follow the harm assessment and other non-normative guidance closely.
Document the development process, or at least key decision points and reasoning, so it can be externally reviewed if needed.
Conduct a robust assessment process, with a specific set of actions the testers need to perform in a given number of scenarios, and user experience (UX) questions for the testers to answer, with target metrics. This should include testing the accuracy and consistency of the metadata being collected.
Define roles and responsibilities for the team conducting the user acceptance testing, including who approves deployment.
Review how the processes for collecting, analyzing and sharing data are compliant with GDPR and privacy best practices.
Create terms and conditions of the tool that includes an overview of the policy for sharing information with external parties.
Create a user manual for the tool and other training documents that promotes an inclusive language that is free of jargon, and tailored to the contexts in which it will be read and used.
Conduct a security audit with an external party (or more than one) that puts emphasis on testing how the reliability of the data can be undermined.
Consider what other internal policies and processes may be needed in order to minimize all types of risk and meet the needs of different actors over time, including thinking about systems for “customer service” and other support that users may need (for data collectors, subjects and recipients).
Create processes to gather feedback from the data collectors and the recipients or consumers of the data, and put aside resources for incorporating this feedback.
Allocate resources for security audits, further testing, patching or redesigning, in particular when bugs or vulnerabilities can undermine the reliability of the data.
If civic journalists, activists and communities cannot capture and share video that is demonstrably authentic, there will not be justice for human rights abuses. To hold perpetrators to account, galvanize policy-makers into action, explore accountability for widespread violations, and advocate for social change, we need to fortify the truth coming from the frontlines of human rights.
Raquel Vazquez Llorente is a lawyer helping communities use video and technology to protect and defend their rights. At WITNESS, she leads a team that engages early on in shaping emerging technologies that will undermine or enhance audiovisual witnessing and impact human rights defenders.