Jun 20, 2014
Data quality is a major concern for anyone building a data warehousing solution. As organisations consume more data from more data sources, the need for data quality management has become increasingly common. Data quality is especially important in a Business Intelligence solution, because the reports and analysis generated from data in the data warehouse can form the basis of important business decisions. Business users must be able to trust the data they use to make these decisions. In today’s post, I’ll highlight some common data quality issues, introduce Data Quality Services in SQL Server 2014 and finish with a crash course on three key elements that are key to data quality. Common Data Quality Issues Below are 3 data quality issues that frequently occur:- Invalid data values – for example, an organisation might categorise its stores as “wholesale” or “retail.” However, a user might use an application that allows free-form data entry to create a store with a category of “reseller” instead of “retail,” or they might accidentally type “whalesale” instead of “wholesale.” Any analysis or reporting that aggregates data by store type will then produce inaccurate results because of the additional, invalid categories.
- Inconsistencies – for example, an organisation might have an application for managing customer accounts in which US states are stored using two-letter state codes (such as“WA” for Washington), and a second application that stores supplier addresses with a full state name (such as “California”). When data from both of these systems is loaded into the data warehouse, your data warehouse will contain inconsistent values for states.
- Duplicate business entities – for example, a customer relationship management system might contain customer records for Jim Corbin, Jimmy Corbin, James Corbin, and J Corbin. If the address and telephone number for these customers are all the same, then it might be reasonable to assume that all of these records relate to the same customer. Of course, it’s also possible that Jim Corbin has a wife named Jennifer, and a son named James, so you must be confident that you have matched the records appropriately before deduplicating the data.
- Data Cleansing – identifying invalid or inconsistent data values and correcting them.
- Data Matching – finding duplicate data entities. Data Quality Services is installed from the SQL Server 2014 installation media, and consist of the following components:
- Data Quality Services Server– a service that uses a knowledge base to apply data quality rules to data. The server must be installed on the same instance as the data that you wish to analyse. Two SQL Server catalogues are installed, and you can monitor, maintain, backup, and perform other administrative tasks on these from within SQL Server Management Studio. DQS_MAIN includes stored procedures, the DQS engine, and published knowledge bases. DQS_ PROJECT includes data that is required for knowledge base management and data quality project activities.
- Data Quality Client– a wizard-based application that data stewards (typically business users) can use to create and manage data quality services knowledge bases and perform data quality services tasks. The client can either be installed on the same computer as the DQS server or used remotely.
- Data Cleansing SSIS Transformation– a data flow transformation for SQL Server Integration Services that you can use to cleanse data as it flows through a data flow pipeline.
- Domains that define valid values and correction rules for data fields, and
- Matching policies that define rules for identifying duplicate data entities.
- Valid– for example, valid values for a US State domain might include “California” or “CA”
- Invalid– for example, invalid values for a US State domain might include “8”
- Error– for example, a common error for a US State domain might be “Calfornia” (with a missing “i”)
- Create a free DataMarket account key at the Windows Azure Marketplace
- Subscribe to a free or paid for RDS provider’s service at the Marketplace
- Configure the reference data service details in DQS
- Map your domain to the RDS service
- Use the knowledge base that contains the domain that maps to the RDS service to cleanse the data
How do your Excel skills stack up?
Test NowNext up:
- Automatically reach your deadlines with scheduled tasks in Microsoft Project
- Manage your administration with ADAC and PowerShell
- Customise the weather forecast in Outlook 2013
- Data Quality in SQL Server 2014 for dummies (Part 2)
- Protecting content in Microsoft Word with ‘Restrict Editing’
- How to populate tables in Excel VBA
- Monitoring communication sessions in Lync Server 2013
- A truly botched presentation…but Samsung are happy!
- Combine MATCH and INDEX in Excel for a powerful tool
- 4 steps to establish a SQL Server connection
Previously
- Life has many, many stations. Having trouble getting to your next station?
- Taking the shortcuts in Windows 8
- Multi-level sorting made possible in Microsoft Word
- Spring clean your PC with Windows ‘Disk Cleanup’
- Pinpoint your 3D chart’s data points with drop lines in Excel
- The seven keys to develop your personal effectiveness
- 4 tips to make you a Windows 8.1 pro
- Sorting made easy with custom lists in Excel
- TechEd 2014 – Australia’s largest Microsoft IT exhibition is evolving!
- Communicating clearly to save time and money