DataBauhaus Data Quality
DataBauhaus has materialized its broad experience, taken from enterprise data quality projects, into DataBauhaus Data Quality, a powerful and scalable data quality platform, incorporating profiling, cleansing, standardization and matching capabilities. The platform has been designed with high data volume and open integration in mind and will enable quality cleansing and integration of very large data sets.
Analysis
The data quality analysis components allow identification and categorization of data quality issues, present in data sources. Analysis results will enable to
- describe structure and completeness of elements in data sources
- understand integrity and correctness of data elements
- define appropriate corrective measures with the goal of improving the quality of data elements
Cleansing and Standardization
To address and rectify data quality issues, powerful capabilities are provided to cleanse, standardize, validate and enhance content of data elements. Application of various techniques allow
- removal of unwanted content from data elements
- content decomposition by applying parsing definitions
- standardization and validation of content against sophisticated dictionaries, including typographical error correction
Data standardization techniques can be applied to data types of various domains, such as customer, postal and electronic addresses, product etc.
Record Association and Matching
Identification of data redundancy requires application of record matching techniques in order to define similarity between records. Commonly referred to as data matching, DataBauhaus Data Quality provides a flexible and highly scalable duplicate detection process in conjunction with unique features such as
- High-performance data association, suitable for large to very large data volumes
- Operable on computationally constrained platforms
- Anonymous use of data elements for data association (anonymous matching)
- Simplified incremental association due to cluster persistence
DataBauhaus’ Data Quality record association process has been designed with large data volumes as well as data privacy in mind, suitable to, for example, match very large customer data sources while maintaining an individual’s data privacy. Freely configurable logical association definitions allow building hierarchical data views, such as households of various kinds.
Localization for Domain and Regional Functionality
DataBauhaus Data Quality implements locale sensitive, Unicode-compliant content capture and management, applicable to all rules, definition and dictionaries. This allows the provisioning of regional content, such as
- Language and country specific dictionaries
- Country specific parsing definitions
- Language specific phonetic encoding
- Locale-specific cleansing rules
Localized content is packaged separately, but can be combined within the same environment. This provides the flexibility to process data from very specific domains to full sets of international data.
Open Integration, Scalability and High-Performance
Ready for processing of enterprise data, DataBauhaus Data Quality provides open integration into
- Relational databases
- Data Transformation Engines
- Web Services
Following the guiding principle of processing large data volumes most efficiently, DataBauhaus Data Quality provides highly scalable data quality functionality for enterprise-level high-performance data integration.