Typo

How to Select the Best Technology for Data Quality Management

marketing — Mon, 04 Nov 2019 02:00:16 +0000

Outdated data quality software can put you at a competitive disadvantage and drive up organizational costs.

Modern technology is proactive, avoids more costs, and mitigates more risk.

In the following post, we will outline what to look for when selecting data quality technology as well as how to leverage artificial intelligence.

Proactive vs. Reactive

Reactive data quality tools attempt to address errors after they are persisted to a data store. During transfer from an initial data store to a data lake or warehouse, data quality tools identify errors and attempt to resolve them to maintain a cleansed destination data store. This transfer may occur days or months after the data was originally created. Due to this lead time, the user is unlikely to recall details of a single record out of the thousands entered that month.

As a result, these errors may be handled with an elaborate remediation process that is part of a larger data governance program and council. The remediation workflow for a single error can involve technical support representatives, subject matter experts, data stewards, and data engineers. In a typical scenario, a support rep will document a problem then data stewards and engineers will investigate the cause. When the cause is identified, the data steward will discuss the preferred solution with subject matter experts of the data. The fix must be documented by the steward, presented for approval by the data governance council and then implemented in a data quality rule by a data engineer. The estimated cost of remediation for a single new error is $10,000. After this investment, the rule will provide automated quality enforcement for each recurrence of the same error.

Due to the costliness of reactively remediating errors and the risk of accidentally using bad data that was saved, a proactive solution is preferred. Proactive solutions prompt the creator of the data to fix the error at the time of entry. The cost to resolve an error at the time of entry, known as prevention cost, is estimated to be $1.[1] When the error is resolved by the creator and at the time of entry, the best resolution can be provided at the lowest cost. The user entering the data is not given the time or chance to forget the context of the entry. Poor data introduced by IoT devices are immediately identified and quarantined. A real-time approach at all points of data entry can avoid first time exposure.

[1] Labovitz, G., Chang, Y.S., and Rosansky, V., 1992. Making Quality Work: A Leadership Guide for the Results-Driven Manager. John Wiley &Sons, Hoboken, NJ.

REACTIVE	PROACTIVE
Incur risks and costs of first-time error exposure	Avoid first-time error exposure
$10,000 remediation cost	$1 remediation cost
Lengthy Solution	Immediate resolution
Delayed identification and remediation cause subpar solution due to limited information availability. Best case solution could be deleting an entire row of data	Best resolution possible because the data creator is providing the fix at the time of entry

Putting Artificial Intelligence to work

Traditional data quality tools require rules to be created for each error that your enterprise has experienced or anticipates. Leveraging artificial intelligence and deep learning enables protection against errors you cannot predict. Preventing first time exposures to errors can save $10,000 or more per instance in remediation costs as well as preventing risk and much larger costs from decisions based on poor data. Unlike traditional data quality tools that require updates to rules when data requirements and validation changes, AI technologies can adapt to changes by learning from data and responses from users. This avoids the cost of maintaining a large set of data quality rules.

Protecting data assets with TYPO

A small amount of poor data that originated from as few as one point of entry can spread throughout your enterprise and wreak havoc. Bad decisions, missed opportunities, tarnished brands, diminished credibility, financial audits, wasted time and expenses may result.

Because poor data can cause great companies to make bad decisions, we developed Typo, an intelligent data quality service.

Typo inspects data at multiple points of entry and in real-time as data is entered by a device or user of an application. When an error is identified, the user is notified and given the opportunity to correct the error. Typo uses artificial intelligence and becomes smarter as it learns from user responses to error notifications.

How Typo can help your organization

Supports trustworthy and accurate business intelligence, enabling executives to make decisions with confidence.
Mitigates costly errors, lost opportunities and damaged reputations by preventing first time exposure to new errors
Saves time by providing clean data from inception; lessening arduous data cleansing processes, allowing your data management team to work on developing proactive insights instead of reactive data management.
Improves ROI on transactions across business channels and integrated value chains by preventing data errors as soon as possible in the value chain lifecycle. This isolates the error and prevents negative consequences deeper in the value chain when the errors are more costly to identify and correct.

Why Typo is the best tool for your Data Quality Management

Typo provides proactive data quality at origin. Traditional data quality tools are reactive because they attempt to address errors after they are saved to the data store. Traditional data quality tools are often applied late in the data lifecycle. Conversely, Typo corrects errors at the time of entry and prior to storage, even on applications hosted by a third party.
Simple to Deploy and Use. Typo is available as a cloud service that is set up through a responsive web interface. Unlike traditional data quality tools, Typo does not require a team of experts for installation, configuration, and maintenance.
Fast Time to Value. Implementation, testing, and deployment of custom rules are not necessary because Typo uses AI and automated algorithms.
Low Maintenance. Typo uses AI that automatically adapts to change by learning from data and responses from users. Since there no or few custom rules to update when your data quality requirements change, you can spend more time using your data than maintaining it.
Faster & Higher ROI. Upfront intelligence, that resolves errors in real-time at the first point of entry, alleviates the need for complex remediation workflows and skilled technical teams that manage errors later in the data lifecycle.

The post How to Select the Best Technology for Data Quality Management appeared first on Typo.

How to Develop a Strategy for Data Quality Management

marketing — Mon, 21 Oct 2019 01:00:31 +0000

Does your organization have a data quality control strategy in place?

If you want to improve data quality and reduce operational costs across your organization, having a strategy in place to manage quality and set organizational standards is critical.

In order to develop and execute a data quality strategy, you will need to have a team and process in place. The following section provides guidelines on how to structure a team and develop processes.

Team

The first step in implementing a data quality strategy is to recruit a data governance team that will establish clear data definitions, develop comprehensive policies, oversee documentation by which internal business units collect, steward, disseminate, and integrate data on behalf of their organization.

This team must be experts across different functions in the organization. Having a cross-functional governance team in place will ensure the success of the long-term vision as well as build a sustainable and trusted data ecosystem. Furthermore, establishing a cross-functional team will develop a data-driven culture and creates data champions across the organization.

The governance team starts with a team leader, which is often the Chief Data Officer. The team leader is responsible for overseeing the entire team. This includes managing team focus, communicating procedures and monitoring success. The team lead will develop an executive committee that includes executive leaders from across different functions of the organization such as finance and marketing. Once the executive committee is established, they will be responsible for developing and overseeing the data governance processes and policies.

The team will also recruit mid-level managers across the different functions of the organization. These managers will be responsible for championing the data governance strategy and collaborating across the various functions of the organization. The managers will define processes, data quality metrics, and best practices.

Lastly, the managers will assign data owners, stewards, and users. The data owner is responsible for compliance, administration and access control of data. Data stewards are responsible for being the conduit between business users where they will interpret data sets and develop usable reports. Users are the organization members directly responsible for entering and using data as part of their daily task, they are responsible for reporting data irregularities to the owners and stewards.

Process

Define Scope

When implementing a data quality strategy, determine business processes that can be readily affected through improved data quality. The project should have definable and recognizable issues. Initial projects should be smaller in scope, allowing shorter iterations and faster results, which will provide immediate value. This will improve future executive support of larger projects.

All projects should include associated implementation costs and a timeline of how long the project will take and when results can be expected.

Map Data to Key Business Processes

Once it has been determined which key business process will be affected by the scope of the initial data quality project, the flow of data within the process will need to be mapped.

Mapping the data flow provides the big picture of how the data is being used downstream, what other business processes it affects, and ultimately what business initiative metrics it is being used for.

Plot Financial Implications on the Business

After the data flow of the process is defined, a deeper understanding of the financial implications can be achieved. It may be determined that the poor data is affecting more areas of the business than originally hypothesized, showing a greater cost savings opportunity.

Whatever the case may be, it will be important to plot out all of the financial implications the project will have on the business. It will be important to work with business management, accounting, and the finance department to ensure accuracy of the implications as well as gain trust and support for future projects.

Select the Right Technology to Help Facilitate the Process

When you’re ready to begin moving forward with implementation, you will need to determine what technology you will leverage to facilitate the data quality evaluation process.

The organization will need to utilize a diagnostics tool for data discovery and profiling. This tool will be key throughout the entire data cleansing process. It will be used to evaluate data set differences over time, quantify probable outcomes from cleansing and estimate the ROI of your project.

Determine Data Quality Metrics

Next, metrics will need to be selected that will capture the business impact. Metrics can range from simple to complex (i.e. measuring across several different data elements).

Once the metrics are established, you will need to determine what indicators will be measured, detailed in this previous post. The indicators are relevance, completeness, timeliness, structure, accuracy, precision, consistency, uniqueness, accessibility, understandability, community, and interoperability.

Link these metrics to key business initiatives to communicate the organizational value of data quality projects.

Establish Data Quality Project Best Practices

Lastly, as you continue to learn from each iteration of your data quality projects, you will want to begin establishing best practices. Best practices should be used consistently as the importance of data quality is communicated throughout the organization. By advocating best practices and communicating the importance of data quality through measurable results you will be able to influence a cultural shift in how the organization views data quality.

The post How to Develop a Strategy for Data Quality Management appeared first on Typo.

4 Steps to Take Now to Determine the Quality of Your Data

marketing — Mon, 30 Sep 2019 01:00:23 +0000

With a lack of executive confidence and high cost of poor data, it lends the question of how bad is your data quality and even more importantly how can it be quantified?

The estimated annual cost of data quality problems for US businesses is $611 Billion. Less than 33 % of companies are confident in their data quality.

Therefore, understanding the quality of organizational data is extremely important. See our previous post on determining the cost of bad data.

In this post, we share a quick way to help you measure the quality of your organizational data.

To measure data quality, we recommend using the Friday Afternoon Measurement (FAM) method. This method allows managers to measure data quality (DQ) with a score.

The FAM method has provided some astonishing insights into data quality across organizations. According to insights published by the Harvard Business Review, 47% of newly created data records have at least one critical (work impacting) error. Even more staggering was that only 3% of the DQ scores HBR reported were rated as “acceptable” using the loosest possible standards. These poor data quality scores were consistent across all business sectors, private and public.

How to use the FAM Method

There are four simple steps an organization can take to apply the FAM method in order to obtain a DQ score.[1]

Step 1

Assemble the last 100 data records that your group used, such as setting up a customer account or delivering a product.

Step 2

Ask two or three people with knowledge of the data to join you for a two-hour meeting.

Step 3

Working record by record, instruct your colleagues to mark obvious errors. For most records, this step will go very quickly. Your team members will either spot errors — the misspelled customer name or information that’s been placed in the wrong column — or they won’t. In some cases, you will engage in detailed discussions about whether an item is truly incorrect, but usually, you will spend no more than 30 seconds on a record.

Step 4

Summarize the results in a spreadsheet. First, add a “record perfect” column to your spreadsheet. Mark it “yes” if there aren’t any errors, otherwise enter “no”.

To interpret the data simply extrapolate the errors of perfect or not. For example, if only 40 of the 100 analyzed were without error then you can infer that you have a 40% DQ score and a 60% error rate.

This rate can be quantified by using the rule of 10, based on an observation that it costs 10 times as much to complete a unit of work when the input data is defective as it does when it is perfect.

As a simple example, suppose your work team must complete 100 units per day and each unit costs $1.00 when the data is perfect. If everything is perfect, a day’s work costs $100 (100 units at $1.00 each). But with only 40 perfect:

Total cost = (40 x $1.00) + (60 x $1.00 x 10) = $40 + $600 = $640

As you can see, the total cost for the day is over six times higher than the cost when the DQ score is 100%. Consider how much your organization can save by eliminating errors. For example, if 50% of the errors are eliminated in this scenario then the organization will benefit from a 42% reduction in daily cost.

[1] Thomas Redman, Harvard Business Review, Assess Whether You Have a Data Quality Problem

The post 4 Steps to Take Now to Determine the Quality of Your Data appeared first on Typo.

How to Determine the Cost of Bad Data and Gain Organizational Trust

marketing — Mon, 16 Sep 2019 01:00:48 +0000

Executives tend to distrust organizational data. Determining the cost of bad data to your organization is the first step in gaining executive trust.

Why Executives Distrust their Data

A broad set of indicators determine the value of your enterprise data. With the breadth of this set ranging from accuracy to understandability and community contributions, deficiencies in any of the data value indicators are common. One problem in any of these indicators can cause your data to transition from an asset that provides strategic competitive advantages to an unknown liability that results in poor decisions, missed opportunities, damaged reputations, customer dissatisfaction and exposure to additional risk and expenses. For example, inaccurate information can cause skew in summary data or bias in data science models.

Errors like these can become extremely impactful to business decisions that ultimately affect the bottom-line. As data volume and sources increase so does the relevance of managing quality. Unfortunately, errors like the examples given are not uncommon, leading to mistrust of data. In fact, according to a study done by Harvard Business Review, only 16% of managers fully trust their data.

A recent study by New Vantage Partners, uncovers more reasons for executive concern, especially executives leading transformations to data-driven cultures. The study cites cultural resistance and lack of organizational alignment and agility as barriers to adoption of new data management technologies. What stood out the most is that 95% of the executives surveyed said the biggest challenges with data management changes are cultural, stemming from people and process. There is a clear lack of (and need for) tools that can be easily adopted and improve processes for data management.

The cost of poor data

While trust in data quality is low, executives recognize the importance of data quality. Organizations are beginning to understand the high cost associated with poor data quality. In a recent study done by Experian Plc., bad data costs companies 23% of revenue, worldwide. Even more eye-opening is that according to IBM, the total cost of poor data quality to the U.S. economy is an estimated $3.1 trillion per year.

The cost mainly comes from initial errors that can have downstream effects, inciting an expensive reactionary response to errors. For example, according to a survey done by 451 Research, 44.5% of respondents cited the finding of data errors by using reports and then taking subsequent (after the fact) corrective action as their means for data quality management, while 37.5% employed a manual data cleansing process.

But it doesn’t stop there, highly skilled data analysts in IT groups are misusing valuable time to manually analyze and fix errors. According to a study done by Syncsort, 38% of data-driven analyst roles spend more than 30% of their time manually remediating data.

Likewise, MIT reported that in a recent study they found that knowledge workers waste up to 50% of their time dealing with mundane quality issues, and for data scientists, time spent on quality issues can be as high as 80%. This is time that could be better used uncovering business-transforming insights, developing solutions to complex business challenges, or creating revenue drivers instead of revenue drains.

The post How to Determine the Cost of Bad Data and Gain Organizational Trust appeared first on Typo.

11 Key Indicators to Determine if your Data is an Asset or Liability

marketing — Tue, 10 Sep 2019 00:00:07 +0000

Is your data a liability?

Data should support knowledge workers and empower executives to make the best decisions. In this post we will cover the key characteristics you can use to decide if your data is an asset or liability.

Data Asset versus Liability: The Key Indicators of Data Value

The value of a data set is measured by much more than the information itself. Value is determined by the data’s ability to address a need. Data quality is the foundation of information’s ability to fulfill needs and serve knowledge workers. The table below outlines the characteristics or indicators of data value. A substantial deficiency in any of the indicators can render the data useless. Certain deficiencies might be unidentified which can cause unknown expenses and problems. In this worst case and common scenario, the data is perceived as an asset when it is actually a liability. Understanding the indicators that drive value will allow you to assess what is an asset, liability or risk.

Relevance	In-demand and relevant to subject matter that is desired.
Completeness	Provides all desired pieces of information.
Timeliness	Provided by a requested time.
Accuracy	How close a measurement is to the actual value.
Precision	Measure of detail in which a value is expressed. Granularity or unit of measure.
Consistency	All values computed or captured with same method, granularity or unit of measure. Typically applies to summarizations and aggregations.
Uniqueness	Free of duplicate information or different instances of a unique piece of information.
Accessibility	Both discoverable and ease of accessing.
Understandability	Degree of available metadata, documentation, and lineage.
Interoperability	Degree to which the data distribution format is easy to consume, query and integrate with different data management systems.
Community	Number of contributors and frequency of contributions (an indicator of maintenance vs. obsolescence).

Relevance

Data must be relevant to the need of the data consumer. Relevance is the degree that the information is related to the desired subject matter. Data that does not cover the desired subject matter but is correlated can provide value. Consider a custom tailor that produces clothing and recently decided to add shoes to his line. He would like to choose the size and quantity of shoes to keep in inventory. He does not know the shoe sizes of his customers, but he has height measurements. Since height is correlated to shoe size, he is able to leverage this information to make a better inventory decision.

Completeness

Completeness is a factor of providing all needed details. For example, tabular data may lack columns for some or all rows of data. One or more rows of data might be missing. A picture might contain a portion of the required object. A video recording of an event might be cut short.

Timeliness

Timeliness is a factor of response time or delivering data by a required date. Many enterprise information systems use batch processes with lengthy execution times or data preparation occurring each month or quarter which delays the delivery of critical information. Other systems provide real-time results from live data streams.

Accuracy

Either your data is accurate, or it is not. Inaccuracies can reside as dormant risk, which can spread to other data sets and downstream systems. One inaccuracy might be used in numerous calculations, summarizations or data science models. The output from these calculations and models can be used as input to other models. As a result, inaccuracies can reach unsuspecting consumers through a variety of ways.

Unknown inaccuracies create an illusion that your data is an asset. As a result, knowledge workers and executives make misinformed decisions that create problems and expenses. What you thought was an asset is actually a liability. Due to this common and costly misperception, accuracy is considered the most important indicator.

Precision

Measurements provided in inches are more precise than those provided in feet. Data that is more precise can be used in a wider variety of applications when compared to less precise counterparts. Images and video with higher resolution are required for zooming and large format printing with little distortion.

Consistency

Users expect a consistent or better-than-before data experience. Data types and precision should not vary for the same field. The formula to compute a field value should not change from one row to the next. Angles and orientation of photography might need to be the same throughout all pictures. Backwards compatibility between releases will ease the impact of change as data evolves. Changes to data types, distribution formats, precision, or computation formulas causes the consumer to incur the expense to support these changes in dependent reports and systems.

Uniqueness

Uniqueness is the magnitude, which a data set is free of duplicate representations of the same information. For example, multiple entries for the same person are confusing and costly for consumers to resolve. Creating data that is differentiated or unique compared to other sources is considered a part of the relevance indicator.

Accessibility

Accessibility includes the ability of a user to discover and access the information. Providing a seamless access experience includes features such as semantic search for data sources, workflow for a user to attain authorization, and an interface to query or download data.

Understandability

Complex data is unusable if the consumer cannot understand what the data represents. Data sets are easier to use with metadata, detailed documentation and lineage about the sources of the data and history of changes. Incorrect documentation can be as damaging as inaccurate data. Applying data inappropriately will lead to faulty business intelligence and bad decisions. These problems with relapse until the misinformation is corrected or the credibility of the data or documentation is questioned.

Interoperability

The data distribution format affects the consumers ability to easily use the data with a preferred technology. Use can range from running simple queries to integrating the data into an enterprise knowledge graph with data ontologies relevant to the distributed data. Using industry standard formats will ensure high interoperability.

Community

Collaboration between data contributors is proven to create more valuable data. An active community of contributors providing support and users providing feedback will result in data that is more reliable, comprehensive and understood. This factor is a key driver of value for open data.

A synergistic relationship exists between accessibility, understandability, interoperability and community. Modern data catalogs that utilize knowledge graphs and semantic search like data.world leverage this relationship to empower data driven organizations. Knowledge workers are able to find, understand, use and share data assets.

The post 11 Key Indicators to Determine if your Data is an Asset or Liability appeared first on Typo.