The cost of clutter

Written by Duncan McGovern | Feb 14, 2024 5:00:00 AM

Holding onto extra data we don’t really need seems innocuous enough. The dollar cost of storage for an extra field or file may be negligible, and we tell ourselves “this might be useful in the future” regardless of whether or not it still sparks joy.

The problem is that humans (by nature) tend to overvalue what we have. That historic ‘nice to have’ data is unlikely to generate dramatic new insight. We’re better off thinking critically about what exactly we’ll do differently in the future if we had more information, and then figure out what data we need to support that decision.

In addition, there’s a hidden tax associated with the collector’s mentality. Unlike the box of miscellaneous fasteners in the basement that—someday!—might be the exact thing we need to save ourselves a trip to the hardware store, extraneous information in our technology systems creates a slow drain on our ability to be agile in the future.

There are a couple of reasons why collecting without a plan tends to create problems.

Reporting challenges

The more data we have to wade through, the more difficult it becomes to zero in on the key factors we should be paying attention to. One very simple but surprisingly common issue we encounter is storing the same thing in multiple places. This can happen due to staff turnover, different technical approaches to solving the same issue, or different people tracking nominal variations of the same concept.

Three different versions of an “Interests” field isn’t helping the organization, regardless of how or why they were created. Clean, consolidate, and simplify so nobody’s wondering what to use.

Even if we aren’t confused by multiple fields, ad hoc data collection can present other reporting challenges. Do we clearly understand what that “Interests” field means, how it was populated, and whether or not the data is comprehensive and accurate? If the answer to any of these questions is ‘no,’ can we really trust what the report tells us?

Change becomes more expensive

Systems are made of interrelated components, and the more sophisticated they become the more careful we need to be when making changes. We should:

Know the purpose, function, and dependencies of each part
Ensure this contextual information is documented and accessible

Each place we store data is another chance for questions to come up during change. “Is XYZ impacted? Is that a concern? What can we do about it?” The more prepared we are to answer these questions the less effort they will take to address.

There are certainly technical tools and techniques to help manage technical changes, but the simplest and most powerful starting point is to be confident that what’s in place is worth having. That means removing items that no longer fit that criteria as well as being picky about what’s added.

Won't AI make lemonade out of our messy pile of incomplete data lemons?

In short, probably not.

The ability to analyze unstructured data is not the same as knowing what data are accurate, relevant, and used consistently. Unless that information is stored in the system in a way that AI can understand, we’ll still rely on humans to provide context:

Where did the data come from?
Is it relevant to the current need or question?
Are missing values incomplete or intentional?
Have different staff members tracked the same thing in different ways over time?

AI will get better at asking these types of questions, but probably won’t have the answers that are critical to getting reliable information and insight.

Striking a balance

There’s not a one-size-fits all approach to data collection, but there are a few guidelines we can follow:

Be thoughtful about the business objective, value, and level of effort between data collection and the use you envision before collecting anything in the first place
Incorporate cleanup into the workflow of adding anything new—what does this replace and can we remove it as part of the new work? This is also a chance to remote or reduce security vulnerabilities if you are able to eliminate sensitive data.
If you must preserve the “maybe we’ll use this someday” data, keep it out of the way. A simple example is exporting data from a CRM before removing fields or records, so it is available if needed in the future but no longer part of centralized business systems.

Our systems will never be perfect, but the better we understand and plan how we collect data, the better the results will be.

View full post