Fixing Bad Data

All Articles AI Culture Data Management Level 12 News Python Salesforce Software Development Testing

In our post on the cost of bad data, we explored how inconsistencies, errors, and outdated information can lead to wasted time, lost revenue, and poor decision-making. But what exactly causes these data issues in the first place?

Data inconsistencies don’t appear out of thin air. They often stem from manual entry errors, outdated records, siloed teams, and a lack of clear standards.

Without addressing these root causes, any attempt to clean up data will be temporary at best.

Fixing bad data involves knowing common data inaccuracies, auditing your data, defining quality, cleaning your data, and prioritizing data cleanliness.

In this post, we’ll break down typical sources of data inconsistencies and explore practical ways to identify, clean, and prevent data cruft from accumulating.

Causes of Data Inconsistencies

Manual Entry and No Standards

Manual entry is probably the cause of most inconsistencies in your data. When data entry is tedious or leaves the user to format the data, users are less likely to enter data at all.

Lack of conventions or standards contribute to this as well. While users still need to execute them, expectations are guard rails that define what kind of data is expected and when data should be collected or entered.

It’s worth examining processes and working with teams to determine what data is needed versus what is assumed as necessary. Team feedback is also invaluable in uncovering friction while collecting data.

Outdated Data

If you aren’t regularly getting the most up-to-date data, you risk outdated reports, dashboards, and analytics. Business decisions depend on context, and leaving out data means missing valuable insights and experience from your data.

Data requirements can change over time, so it’s prudent to go back and fill in data if needed. For example, a company decides that every opportunity in a particular stage should have certain company information.

This is easy to implement for new opportunities, but what about old ones? It may be helpful to go back, even within a specific timeframe such as a fiscal quarter or year, and ensure existing records meet that criteria.

Lack of Sharing

Teams that need the same data but can’t access the same data will inevitably contribute to poor data hygiene. Information collected by one team about a particular set of customers could be completely different from the same information about the same customers collected by another team.

Integrating data across teams and implementing a process to resolve inconsistencies encountered by different teams reduces duplicate data and contributes to accurate data.

On the other hand, integrations can also contribute to data inconsistencies. An integration may be a one-way sync when it should be two-way. A mismatch between the data format and validation could lead to missing, incorrect, or duplicate data. Essential data may not be synced at all.

How to Identify and Clean Bad Data

1. Know Common Pain Points

Data can take multiple formats. An address could be “123 E. Main St.,” 123 East Main, 123 East Main Street, 123 Main St E, for example (or any combination of the above). Phone numbers can have dashes, parentheses, or no delimiter at all.

Data can be inaccurate, incomplete, or have slightly different variations. Decide whether to delete or update incorrect or obsolete data.

Examine business processes to see where to implement and automate data validation. Good candidates include processes like customer support interactions, automating address formatting according to USPS standards, or verifying new addresses against the national change of address (NCOA) database.

2. Audit Data Quality

Run tools that scan for missing values, outliers, or atypical or inconsistent data. Asking for feedback from employees or users also helps identify discrepancies.

Compare data across systems to check for discrepancies–for example, if a product catalog in the CRM used for sales doesn’t match the catalog used in the warehouse.

It’s essential to audit the data as it defines the scope and timeline of the cleanup. Good planning is critical to cleaning up the data while not losing sight of other business priorities.

3. Define Data Quality

Once you know what data shouldn’t look like, define what it should look like. All goals should be measurable, and good data is no different.

Clearly outline standards for good data. For example:

  • How often should data be updated?
  • Does the data comply with predefined standards? How much?
  • How many records are correct?
  • How many fields are missing in a database?
  • Should record formatting be uniform?

4. Clean Bad Data

  • Deduplicate and merge data

    Automated tools can help identify and alert to duplicate entries, if not automatically merging them.

    Unique ids, such as an account number or customer ID, can prevent data duplication and ensure data accuracy.

  • Standardize entry

    Use predefined dropdown fields where possible. This reduces the manual effort to input data and keeps data consistent.

    Where predefined choices aren’t possible, use uniform naming conventions and data validation rules to enforce consistency. Again, think in terms of preexisting and future data.

  • Fill in data

    Use data from other systems to fill in missing fields, or use a third-party system if possible. Regularly ask users to update their data at specific intervals or times.

  • Remove obsolete and inaccurate data

    Data removal should be governed first by external benchmarks or requirements. Many industries require data to be kept for a certain time.To meet those requirements, records may be automatically deleted or scheduled for a regular review.

    Archiving data may be a more appropriate solution or only surfacing older data to those who consistently need it.

  • Implement data governance

    Create clear guidelines for data storage, entry, and validation. These should be company-wide, specific to each department, and tailored to how those departments store and interact with the data to achieve their goals.

    Create metrics that measure data quality informed by company goals. Data goals should correlate to company goals, otherwise you’re collecting data just to have data.

5. Prevent Bad Data

  • Automated Data Quality Checks

    This should happen in real time as much as possible. The goal is to reduce friction in formatting data and finding duplicate records.

    Using automated features or AI to alert users to duplicate records saves time and manual effort. Surfacing potential duplicate records makes it easy for users to update existing records if applicable.

    Automated tools and workflows can also format and clean data. Validation or processes can check for prerequisite data and enforce subsequent data consistency. For example, to require certain documents when creating a record, implement a custom workflow to create the record and require uploading documents simultaneously.

  • Employee Training

    Teach employees the effect of bad data and to value and rely on data as much as the leadership team. Data exists to help them do better at their jobs. Ongoing training should include data entry and management.

  • Integrated/Sync Systems or Fix Existing Integrations

    Reduce manual effort and ensure all business systems (CRM, ERP, finance, marketing) share the same data. Regularly reconcile any discrepancies and consider what information should be shared.

  • Establish a Schedule

    Consider monthly or quarterly mini-audits. It takes less time and is easier to catch issues rather than cleaning up data in longer intervals.

How to Prioritize Good Data

  • Establish expectations and procedures - implement clear expectations and procedures for data entry, validation, and maintenance.
  • Invest in automation and/or AI - reduce human errors and reallocate finite resources by leveraging technology.
  • Steward data from the top down - empower teams to own data quality. Data is everyone’s responsibility, from the CEO to the bottom of the organizational chart.
  • Audit and maintain data regularly - rather than limited efforts to clean up data, data cleanliness and cleaning should be a regular priority.

Data that Works

Bad data is fundamentally a business problem. If data exists but isn’t reliable, it’s not truly working for your company.

By identifying the causes of data inconsistencies and implementing proactive strategies—such as standardizing entry, automating quality checks, and fostering a company-wide culture of data stewardship—you can transform your data from a liability into an asset.

Cleaning up bad data is an ongoing process, not a one-time fix. With the correct governance, automation, and team buy-in, your business can ensure that data remains accurate, consistent, and valuable.

Originally published on 2025-03-12 by Rachel Gruber

Reach out to us to discuss your complex deployment needs (or to chat about Star Trek)