Data cleaning means the
‘proofreading’ of data to eliminate errors and coding (techniques to organize
raw data) inconsistencies, according to Frankfort-Nachmias & Nachmias (2009).
Data cleaning, also known as data cleaning is an integral part of data processing
that should take place before an analysis of the data collected. Computers
perform the function of data cleaning mostly now with the development of
efficient software (Frankfort-Nachmias & Nachmias, 2009). Missing data can
be in many forms like response bias, careless response or no response.
According to Meade & Craig
(2012), internet surveys, especially in cases of ‘obligatory participation’ can
result in data that’s quality can be a concern. They report that ‘careless
responses’ can be controlled by using
identified rather than anonymous responses.
Missing data is a
common concern with multivariate studies, as reported by Little (1988) and can
lead to a questions if it is data ‘missing completely at random’ (MCAR) or if
it’s related to a some variables. Little (1998) suggests that if we compare the
value of means for each variable between groups then we may be able to assess
if it is MCAR or not. Schafer
& Olsen (1998) highlight the possibility of missing data in a multivariate study
as well. They further add that ‘new computational algorithms and software’ have
given the ability to researchers to create proper imputations multivariate
studies. Their study reports ‘multiple imputation’ technique that combine
estimates with m> plausible values. Bourque & Clark (1992) write an interesting point
that ‘data preparation more of an ‘art when compared to science of hypothesis
testing’
One example of data cleaning is dealing with outliers
(these are values or data points considered to be far outside norm of a
variable) that have been defined by some researchers as values that deviate so
much that they arouse suspicion (Osborne & Overbay, 2012). Outliers can have feverish effects on data
analysis and can be handled by either by eliminating if evaluated as an error
in data or by observing, looking at the original responses (Osborne &
Overbay, 2012).
Hypothesis testing is directly connected
to data analysis (Bourque & Clark, 1992). In the rush of testing the
hypothesis researchers usually do an incomplete job of data analysis and then
repeatedly process data to bring it into usable form. In our research study,
our second hypothesis states that individuals who grow up in a cross cultural
home with immigrant parents experience lesser success in life compared to their
peers. To illustrate this relationship
References
Bourque, L. B., & Clark, V. (1992). Processing data: The survey example (No. 85). Sage.
Frankfort-Nachmias,
C., & Nachmias, D. (2008). Research methods in the social
sciences (7th ed.). New York: Worth.
Little, R. J. (1988). A test of missing completely at random
for multivariate data with missing values. Journal
of the American Statistical Association, 83(404),
1198-1202.
Meade, A. W., & Craig, S. B. (2012). Identifying careless
responses in survey data. Psychological
methods, 17(3), 437.
Osborne, J. W., & Overbay, A. (2012). Best practices in data cleaning.
Sage.
Schafer, J. L., & Olsen, M. K. (1998). Multiple
imputation for multivariate missing-data problems: A data analyst's
perspective. Multivariate
behavioral research, 33(4),
545-571.
No comments:
Post a Comment