Jun 26, 2014
In my last blog post, we had a mini crash course on data quality and the three elements essential to data quality – knowledge base, domain, and Reference Data Service (RFS). If you didn’t get a chance to read it, you can access it here. It’ll definitely help you better understand today’s blog post, where I’ll go through another crash course focusing on creating knowledge bases and data cleansing. Creating a knowledge base Building a Data Quality Services (DQS) knowledge base is an iterative process that involves the following:- Knowledge discovery – using existing data to identify domain values, and
- Domain management – categorising discovered values as valid, invalid, or errors; specifying synonyms and leading values; defining correction rules, and other domain configuration tasks.
- Select the knowledge base to use for the project and specify that the action to be performed is cleansing.
- Select the data source containing the data to be cleansed and map the columns in it to the domains in the knowledge base.
- Run the data cleansing process and then review the suggestions and corrections generated by DQS. The data steward can then approve or reject the suggestions and corrections.
- Export the cleansed data to a database table, comma-delimited file, or Excel workbook.
- Output – the values for all fields after data cleansing. All fields in the original data source generate output columns, even those not mapped to domains in the knowledge base (in which case they contain the original data values).
- Source – the original value for fields that were mapped to domains and cleansed.
- Reason – the reason the output value was selected by the cleansing operation. For example, a valid value might be corrected to a leading value defined for the domain, or DQS might have applied a cleansing algorithm and suggested a corrected value.
- Confidence – an indication of the confidence DQS estimates for corrected values. For values corrected to leading values defined in the knowledge base, this is usually 1 (or 100%). When Data Quality Services uses a cleansing algorithm to suggest a correction, the confidence is a value between 0 and 1.
- Status – the status of the output column. A value of correct indicates that the original value was already correct, and a value of corrected indicates that Data Quality Services changed the value.
- Add the data cleansing transformation to the data flow and drag a data flow connection from a source or transformation that contains the data you want to cleanse to the input of the data cleansing transformation.
- Edit the settings of the data cleansing transformation to connect to the DQS, specify the knowledge base you want to use, and map the input columns in the data flow to domains in the knowledge base.
- Drag the output from the data cleansing transformation to the next transformation or destination in the data flow, and map the output columns from the data cleansing transformation to the appropriate input columns in the transformation or destination. The output columns from the data cleansing transformation are the same as those generated by an interactive data cleansing project.
How do your Excel skills stack up?
Test NowNext up:
- Protecting content in Microsoft Word with ‘Restrict Editing’
- How to populate tables in Excel VBA
- Monitoring communication sessions in Lync Server 2013
- A truly botched presentation…but Samsung are happy!
- Combine MATCH and INDEX in Excel for a powerful tool
- 4 steps to establish a SQL Server connection
- 3 settings that will increase your efficiency in Microsoft Project
- The “New” Exchange 2013 Edge Transport Server
- Ace your next presentation with lessons from these tennis pros
- Sync document properties in SharePoint and Word
Previously
- Customise the weather forecast in Outlook 2013
- Manage your administration with ADAC and PowerShell
- Automatically reach your deadlines with scheduled tasks in Microsoft Project
- Data Quality in SQL Server 2014 for dummies (Part 1)
- Life has many, many stations. Having trouble getting to your next station?
- Taking the shortcuts in Windows 8
- Multi-level sorting made possible in Microsoft Word
- Spring clean your PC with Windows ‘Disk Cleanup’
- Pinpoint your 3D chart’s data points with drop lines in Excel
- The seven keys to develop your personal effectiveness