Sustainable Authorship Models for a Discourse-Based Scholarly Communication Infrastructure
Today's scholarly communication infrastructure is not designed to support scholarly synthesis. When gathering sources for a literature review, researchers need to answer questions about theories, lines of evidence, and claims, and how they inform, support, or oppose each other. This information cannot be found simply in the titles of research papers, in groupings of papers by area, or even in citation or authorship networks (the sole focus of most scholarly communication infrastructure).
How might we build an alternative scholarly communication infrastructure that can overcome this core limitation?
II. Discourse graphs: the promise and the authorship bottleneck
For decades, researchersacrossa range ofdisciplines have been developing a vision of an alternative infrastructure centered on a more appropriate core information model: knowledge claims, linked to supporting evidence and their context through a network or graph model. For conciseness here, I call this model a "discourse graph", since the graph encodes discourse relations between statements, rather than ontological relationships between entities.
Much crucial conceptual and technical progress has been made at the level of formal standards, and severalproof-of-concept implementations have demonstrated the promise of this concept. However, adoption, particularly in terms of authorship, remains a hard open problem. In general, coverage of the literature and breadth of sustained contributors remains far lower than we would like. As one data point, contributions to servers for the nanopublications standard for discourse graphs are almost all within bioinformatics and contributed by tens of authors. Tobias Kuhn, a lead on this standard, puts it well: we want an ocean of such "micropublications", but "[a]t the moment, this is no more than a puddle" (p. 492)
I believe the UX problems (broadly construed beyond just usability) that contribute to this bottleneck are both high leverage and relatively neglected. First, contributing to shared discourse graphs is currently disconnected from the intrinsic practices of scholarship, both in terms of toolsets (separate specialized tools and webapps/platforms), and practices (often more formal and unable to mix with the informal speculative notes that are the lifeblood of research work). This disconnect creates significant opportunity costs for authorship. It also leaves the work that scholars already do as a substantial untapped source of potential sustainable contributions. Consider that by some estimates, full-time faculty self-report reading about 200 articles per year; there were an estimated 700k full-time faculty in 2018. So we can estimate time spent reading ~100M articles per year as a lower bound on untapped resources, since students, part-time faculty, research scientists, and citizen scientists also spend significant time reading articles. This matches (and likely exceeds) the scale of the total number of published research papers. Further, the intended audience/beneficiaries of this authoring work are most often some unknown others. This is problematic because, all things being equal, scholars are likely to choose activities that directly contribute to their own work and their direct responsibilities (collaborators, trainees, students, etc.), even if they value benefits to society.
III. Sustainable authorship of discourse graphs by integrating into scholarly practices
Based on this analysis, I believe a promising but underexplored solution path for this authorship bottleneck is to build tools that integrate authoring contributions to discourse graphs into the intrinsic tasks of effective scholarship practices. Here I describe one example point of integration: reading and sensemaking for literature reviews.
A user story
Consider Curie, a researcher, who is studying the role of analogies in cross-boundary innovation. She writes notes about the papers she reads in a digital outliner notebook, in which she is also drafting a literature review for her research project.
Let’s take a look at her notebook and how she might be able to integrate authoring and usage of a discourse graph.
Leaving aside the particularities of the software, the general content structure of her notes is similar to a Google Doc of reading notes: a mix informal and formal observations and structure, including general notes about related ideas, key details about methods, and the core results of the paper.
But there is one crucial difference: while writing notes for a paper, Curie has marked out a key piece of evidence (EVD) from the paper that might inform her synthesis for her focal question about how domain distance modulates the effects of analogies on creative output. This marking creates a new document (or page) in the software with that evidence note as a title, and allows Curie to reference that specific piece of evidence elsewhere in her notebook (similar to Wikipedia), such as while drafting an outline.
As Curie begins to need more contextual details while comparing and making sense of multiple EVD notes, she can elaborate the EVD notes with more details over time, such as by migrating in screenshots of key tables and figures, or methodological details like participants and measures.
Let's take a closer look at an outline Curie is drafting for her literature review.
It is similar to a normal scholarly outline, with a mixture of formal and informal notes, and links to resources and references. Again, there is a small but crucial difference: Curie can reference specific results (evidence notes) while making sense of the case for and against a focal claim.
This enables her to access contextual details for comparing/contrasting claims and evidence a hover or click away without breaking the flow of writing, in contrast to a paper-level citation. In this way, Curie benefits directly from having marked out these CLM and EVD notes.
Finally, consider what happens when Lovelace, a new student, joins the project. To onboard her, Curie runs a graph query to collect claim and evidence notes that inform the focal question, and exports and emails them to Lovelace. She can choose to share just the claim and evidence notes, or also the narrative context of their use in the body of a question note, or the discussions in the reading notes, as appropriate. Alternatively, she could also share hyperlinks if she has an extension to her notebook that auto-publishes only discourse-graph subsets of her notes to a shared repository.
The graph query works because the notebook Curie is using has an underlying extension that recognizes the argument structure that she is using in the outline, through a mixture of indentation patterns and keywords. Here, for instance, Curie can query for opposing evidence for a claim because the system has formalized an "Opposed By" relation between the CLM and the EVD by recognizing a pattern of writing in her outline.
Over the next few weeks, Lovelace spends her time modifying, elaborating, and integrating these notes into her own notebook (instead of laboriously extracting claims and evidence from a long list of papers!), and writes up some notes on new evidence from recently published work that Curie hasn't yet read. She shares these updates with Curie, and the resulting updates to the synthesis outline sparks a novel hypothesis that the project team decides to test for their next set of experiments.
This user story illustrates how the work of authoring a discourse graph can be integrated into familiar, intrinsically useful scholarly practices of reading, note-taking, and writing, to the direct benefit of scholars and their colleagues.
But it also demonstrates the technical feasibility of this vision! These screenshots are not mockups: they are snapshots of my own notes, which I have written for my own work (for a literature review), and actually shared with students and collaborators. The digital notebook I am using did not require me to do a lot of other extra work like setting up an environment or deploying a personal server; the only thing I had to do was install an extension — an active research project — to the notebook with a single click.
I am excited to imagine a world where anyone who cares about understanding the frontiers of knowledge are equipped with tools that enable them to annotate and write notes that better benefit themselves and share discourse graph subsets of their notes to enrich scholarship practices with their immediate colleagues. I want to broaden the lens of scholars to include nonprofit research institutions compiling nonpartisan literature reviews to inform policymaking, and highly motivated communities of patients and their families who are seeking to understand and contribute to research on diseases that personally affect them.
Can this bottom-up, decentralized, peer-to-peer infrastructure help advance original visions around a single universal shared discourse graph? I believe the answer is not directly, but this may actually be a feature rather than a bug. Distributed knowledge graphs are notoriously hard to achieve consensus on, especially as they scale, and there is emergent evidence that local contextualization, ambiguity and contestation may be crucial for scholarly progress.
Therefore, I am excited about institutional structures that can steward local federations of discourse graphs (e.g., at the level of labs, centers, or institutions), enabled by technical mechanisms for dynamic interoperability, such as Project Cambria. If institutions and local collaborations institute methods of consensus, error-checking, and editing for integrating (as an analog to, say, pull requests to open-source projects), there could also be a natural check and balance that is appropriately scaled for bad actors peddling misinformation. As these local federations gather critical mass, we can direct existing technical and institutional structures — repositories, collections, and search databases — or emergent distributed infrastructures —such as distributed knowledge graphs — to curate and index subsets of them for sharing beyond lab groups, for conversations with policymakers and practitioners, facilitating larger centers and research consortia, and so on.
I believe a future with this shape would be marked by sustained, growing contributions to shareable discourse graphs. By substantially lowering the overhead to synthesis, such infrastructures could in turn power more sustainable, accelerated scientific progress across disciplines.
Your point here reminds me of how important it is for our tools to be usable for the audiences we call on to use them.
would the thinking be to expand this to a research group level? are there problems with scaling this process? maybe also is similar to the question of standardizing links between QUE/CLM/EVD
yes! hope is that integrating discourse graphs can help with some of the common problems with collaborative/distributed synthesis (losing/needing context from others’ notes, shared conventions, sorting signal/noise)
+ 1 more...
from a UX perspective, would also need big banners highlighting these informal speculations.
beyond trying to make a platform for this, is there a space within scholarship for these? (while i think that “showing the work” behind research is important, some steps are not worth sharing, would be “junk,” and complicate discoverability.)
Big +1 on the UX perspective on marking speculative claims: this is a frequent pain point amongst hypertext notebook users who try to do something like this.
Great point about the subtleties around what is useful to share! I think it varies by context and person: for myself, having intermediate products and speculations stick around can be incredibly generative, and probably similarly for a close collaborator with a lot of shared context: maybe not so much for someone who doesn’t have as much shared interest/context with me!