摘要: To build an effective learning model, it is must to understand the quality issues exist in data & how to detect and deal with it. In general, data quality issues are categories in four major sets.
Noise
Many says if there is no noise in data, data mining would be too easy. Noise in data represent the modification of original values. Prof Jeff M. Phillips from University of Utah defines the main causes of noise in data as mentioned below.
......
Outliers
As the name implies, outliers are data objects which are considerably different than most of the other data objects. The object pointed in below image has different (X,Y) attributes than all other data objects hence qualifies for outlier.
.....
Missing values
It is very much possible to have data objects with missing one or multiple attribute values.
.....
Duplicate data
.....
Full Text: kdnuggets
若喜歡本文,請關注我們的臉書 Please Like our Facebook Page: Big Data In Finance
留下你的回應
以訪客張貼回應