One of the major challenges that open-source data collection seeks to solve is that of resolution, the relationship between granularity and relevance. Data is abundant; but time and resources are scarce, and those are the transformative factors that turn a column of numbers into a way to make change--as James Gleick summates it, “When information is cheap, attention becomes expensive.”1 It’s easy to convince ourselves that with enough information, we’ll be able to predict and determine outcomes. This is only true if our scales of measurement match our scales of management - or if we can seamlessly translate between the two.
Our-Sci is an organization conceived to help any community collect, build, and organize user data, making sophisticated research tools accessible and open-access to answer their own questions. Our-Sci’s lab and research operations are functionally organized around extracting the most information with the smallest investment of time and resources. This tension was made particularly visible in a recent research investigation by Our-Sci’s Dan TerAvest and a team at Michigan State University. In their paper “Accessible, affordable, fine-scale estimates of soil carbon for sustainable management in sub-Saharan Africa,”2 they sought an improved system for evaluating soil carbon stocks in order to make effective management recommendations to restore degraded soil.
They faced the following challenges: First, soil carbon levels were extremely varied across fields, and the the best maps available from the Africa Soil Information Service (AfSIS) database did not reflect this variation: the lowest resolution was still three times greater than the size of even midsize farms, let alone smallholder farms and the variations within them. Detailed knowledge of real soil carbon (soil C) levels is crucial for managing soil fertility both on an individual and a policy level, but the records at hand did not accurately reflect reality at ground level. This reflects, the authors note, “a broader problem with world soil data bases, the smoothed, downward-biased nature of the predicted soil properties.”
Many of the current trending solutions — machine learning, intensive lab analysis — are not appropriate here due to expense, scale, and relevance. Enter the Our-Sci reflectometer, a low-cost, portable, hand-held spectrometer that can measure spectral reflectance and correlate those measurements with data models to make functional predictions.3 Reflectometer predictions proved useful for predicting fertilizer responsiveness, even in unsampled fields: “Despite its simplicity and low cost, we found that the reflectometer predicted soil C levels precisely and with sufficient accuracy to inform management.”
The reflectometer solves these problems of cost and scale, but with a critical caveat — users aren’t taking direct measurements, but are comparing to a model, leading to correlations “no higher than 70%.” This tradeoff is not unique to this particular tool, but is a constant complication at the heart of data collection — or even communication — itself. Directly measuring something, while precise, is often very expensive. In contrast, predicting a property, in this case the soil C level, may be much less precise but it can provide actionable information at a much lower cost.
In a way, reflectometer measurements function as a form of shorthand, or any other distillation cypher, like Morse code. Our role as responsible users of this data is to continually confirm the measurements we receive from multiple contiguous perspectives: user validation, ground truthing, and the same method that telegraph operators used to solve this problem decades ago: redundancy. “To guard against mistakes or delays, the sender of a message should order it REPEATED; this is telegraphed back to the originating office for comparison.”4
This tool and the subsequent data that it collects were designed to be easily used and operated. Beyond soil research, more research instrumentation and data collection devices ought to be made more accessible. We especially believe that our focus on validation and redundancy is an important one, both in terms of user training and ensuring the data collected is reliable. Especially since important decisions are being made based on these soil data from around the world, it’s critical that we can trust the numbers.