Holding onto extra data we don’t really need seems innocuous enough. The dollar cost of storage for an extra field or file may be negligible, and we tell ourselves “this might be useful in the future” regardless of whether or not it still sparks joy.
The problem is that humans (by nature) tend to overvalue what we have. That historic ‘nice to have’ data is unlikely to generate dramatic new insight. We’re better off thinking critically about what exactly we’ll do differently in the future if we had more information, and then figure out what data we need to support that decision.
In addition, there’s a hidden tax associated with the collector’s mentality. Unlike the box of miscellaneous fasteners in the basement that—someday!—might be the exact thing we need to save ourselves a trip to the hardware store, extraneous information in our technology systems creates a slow drain on our ability to be agile in the future.
There are a couple of reasons why collecting without a plan tends to create problems.
The more data we have to wade through, the more difficult it becomes to zero in on the key factors we should be paying attention to. One very simple but surprisingly common issue we encounter is storing the same thing in multiple places. This can happen due to staff turnover, different technical approaches to solving the same issue, or different people tracking nominal variations of the same concept.
Three different versions of an “Interests” field isn’t helping the organization, regardless of how or why they were created. Clean, consolidate, and simplify so nobody’s wondering what to use.
Even if we aren’t confused by multiple fields, ad hoc data collection can present other reporting challenges. Do we clearly understand what that “Interests” field means, how it was populated, and whether or not the data is comprehensive and accurate? If the answer to any of these questions is ‘no,’ can we really trust what the report tells us?
Systems are made of interrelated components, and the more sophisticated they become the more careful we need to be when making changes. We should:
Each place we store data is another chance for questions to come up during change. “Is XYZ impacted? Is that a concern? What can we do about it?” The more prepared we are to answer these questions the less effort they will take to address.
There are certainly technical tools and techniques to help manage technical changes, but the simplest and most powerful starting point is to be confident that what’s in place is worth having. That means removing items that no longer fit that criteria as well as being picky about what’s added.
In short, probably not.
The ability to analyze unstructured data is not the same as knowing what data are accurate, relevant, and used consistently. Unless that information is stored in the system in a way that AI can understand, we’ll still rely on humans to provide context:
AI will get better at asking these types of questions, but probably won’t have the answers that are critical to getting reliable information and insight.
There’s not a one-size-fits all approach to data collection, but there are a few guidelines we can follow:
Our systems will never be perfect, but the better we understand and plan how we collect data, the better the results will be.