Data Quality – The way to a better quality of Business Intelligence
Relatively little attention during the various BI (Bussiness intelligence) projects is payed on data quality from production systems. This is data upon business decisions will be made. Source (production) systems are basis of information and are feeding BI applicationsand with aggregation and presentation of data in a certain way. If the input data does not meet certain quality levels it is unrealistic to expect that the usefulness of projects and applications that have occurred on such a precarious be well, even though their own projects and applications can be done technically perfect. Regardless of whether the data warehouse project, planning, or a project that provides a unique overview of the service users, the quality of existing data will most directly affect the result of the project. Poor data quality at source will surely cause poor business decisions.
Once it is proven how available data is of low-quality, applications users would often leave the project instead of dealing with improving the quality and to recognize that a key problem in the functioning and success of the BI, and similar projects. Regardless of the integration projects poor data quality has its influence also in the production systems. Consequences are are usually manifested as poorer productivity with more errors when routine tasks that use the data of operating system (easiest example: billing to the wrong address). So production shows inability to provide information for monitoring business activities generally and/or those jobs require and consume a lot of IT resources in terms of human labor.
The fact is that few companies have an awareness of how the data is poor quality and much less any awareness that something needs to be done and that the quality of the data must be treated as an equal business problem. Most businesses could specify at least one of the project, which is inefficient in the sense that it is not used because the data with which to operate there are not completely accurate, ie can not be placed upon them. Improving this situation is the explanation that the problem lies in the quality of data and not in the Business Intelligence project.
Incomplete data and low quality
The most common problems can be observed as incomplete or poor quality data. It means that the data simply does not exist or another that data is inaccurate and that is worse. Incorrect data is of course dangerous because it seems everything is OK and actually bringing it as basis for a wrong decision. BI applications are mostly dealing with agregated results and presentation of such results. Incorrect data for a large degree of aggregation will go unnoticed more often than in the operating system.
For example, if user X has an error to account for 10.220 USD instead of 1022 USD to be produced will be probably noticed in production system because the number of accounts in such an account ‘stick’ among others and ultimately appeal administartor. After millions of aggregated amount in the BI application, the difference of tens of thousand is not clearly visible.
Incomplete data is an illusion that we have the information. This is a dangerous illusion in applications where the design should be considered to have some informations and when it is time to analyze and present (which is always at a later stage of the BI) turns out to be no. For example, management company thinks it has the e-mail addresses of its users, in fact, have about 3% of e-mail users because the application does not require this field to always be filled (which in this case is the only correct approach to this attribute). With such data can not learn something new about the structure of the user and can be used in further analysis for sales and marketing.
Data Warehouse collects data from production sources in the manner already described in other articles of the author. The analogy with ordinary warehouse is aplicable in case of incomplete filling of data warehouses and / or inaccurate data. Data Warehouse filled with incomplete data, it is very easy to fill, ie additions. It is necessary”on the shelf”only add what is missing, clearly the case that this can be obtained from somewhere. Data Warehouse filled with wrong information is like a warehouse with mixed items where a worker when it comes to some shelves there are items that do not belong there and have nothing to do with the description on the shelf. This warehouse is much less usable than the empty and dangerous. It requires then considerably more time to find what you need and decide whether this is actually what we want, because in such a repository has no order.
Steps to improve data quality in business intelligence
Once the responsible people in the company to recognize that efforts to improve the quality of data is not foolishly spend money, but the road to better quality information in an enterprise in general. There are several steps that need taking care and who should be found in every project to improve the quality of data….
1 .Setting Team and Resources
In order to effectively cope with the problems of data quality and access problems company needs to form a team of IT staff and staff on the source data, and reports of people who are expecting, ie, lower management. The technology can be helpful but it is necessary to adhere manual work, which typically consume valuable resources in the company, people who need to review and based on his experience ands feeling say that such and such data, these certainly are not good, that historical data should be supplemented with some elements and so IT staff can in principle be of assistance in terms of automation manul entries, etc. but we should not expect that someone will do some magic program that fixes incorrectly entered data.
2. Establishment of Quality Metrics
Focusing on the overall quality of the data in the enterprise is of course unefective. Some areas are certainly important in making decisions and some are less important. Some IT segments dealing with the basic activity of enterprises is usually with less errors, such as billing, which is subject to customer complaints and is easy to spot errors. Should focus primarily on those key areas and provide them with appropriate priorities in the short term will most contribute to the quality. It can be filling half empty information from a variety of reasons usually not included, recast some data that are often known to be as they should, or the introduction of completely new attributes that changed the character of the job requires. Focusing contributions and quality metrics, ie the system by which we measure the quality of data. It can be a very simple test where you calculate how much the availability attributes x in the table customers through simple tests to date the cancellation fee is not greater than the date of entry to the much more complex tools that help detecting entry errors, etc. Identification of the major problems, which later quantified (eg 30% of the stores where there are no specific industry or belong to more than 70% of members there is no information entered is used to the Internet and / or electronic mail) is a good beginning and a clear goal of what to do as opposed to general statements that should be completed for all non-existent information the member / customer.
3. Divide complex tasks into elementary
It is easy to focus on small elements of the problem and gradually solve them everywhere. Previously described examples show that it is good to focus on several attributes of a table which are known to be important in decision-making process and that it is possible to amend / correct the information we have. It is quite another problem if the information is not generally available (or are available but they need to buy) or if not available, eg Customer is gone and company no longer know any information about him. Company can develop good way to gather information on point of sale where customers could voluntarily provide many useful information (for example software industry where the registration of products and benefits brings to an end user a good reason for some feedback about themselves and their vision of the product). ”Training and”cultivating staff to enter information and training staff to recognize that some information, although not mandatory does not mean that they should not be entered through a period, and so to get better data. The staff engaged in data entry are often neglected fields in the optional programs that are not optional entry so that they can not enter, but primarily because it does not attribute any such its value in the real world.
4. Measurement of results
To be able to tell how well the job done using the previously defined metrics, we now have a tool that describes quality entry clerks and the degree of realization of the project. Most of the time course carries project time officers and other staff have expended to retrieve data.
Conclusion – data quality and business intelligence
It is usual that in the eyes of senior management IT is blamed for the quality of reports failed and did not prove expensive investment of BI. Then it happens that forcing IT management to clean dirty data. It is a task that goes beyond the role of IT that should allow the flow of information, storage and access. According to Gartner, about 80% of the process to improve the quality of data guided by IT will be ineffective in achieving goals. IT does not invent, and design information as well as trade information within the company and shall update and amend data. Senior management must be aware of strategic importance to the quality of input data and it is not just a problem for the overall IT operations and the entire company. Also it is a process that must always be continous and not be a project for only a specific period of time.