DSC 410/510 - Multivariate Statistical Methods
Chapter 2
Suggested Solutions
LIST POTENTIAL UNDERLYING CAUSES OF OUTLIERS. BE SURE TO
INCLUDE ATTRIBUTIONS TO BOTH THE RESPONDENT AND THE RESEARCHER.
Respondent:
Misunderstanding of the question
Response bias, such as yea-saying
Extraordinary experience
Researcher:
Data entry errors
Data coding mistakes
An extraordinary observation with no explanation.
An ordinary value that is unique when combined with other
variables.
DESCRIBE THE CONDITIONS UNDER WHICH A RESEARCHER
WOULD DELETE A CASE WITH MISSING DATA VERSUS THE CONDITIONS UNDER
WHICH A RESEARCHER WOULD USE AN IMPUTATION METHOD.
The researcher must first evaluate the randomness of the missing data
process. If the missing data process is non-ignorable, consult an
expert! If the data are missing at random, deleting a case is the
only acceptable alternative of the two. Data that are missing at
random cannot employ an imputation method, as it would introduce bias
into the results. Only cases with data missing completely at random
would utilize an imputation method.
If the data are missing completely at random, the choice of case
deletion versus imputation method should be based on theoretical and
empirical considerations.
If the sample size is sufficiently large, the analyst may wish to
consider deletion of cases with a great degree of missing data. Cases
with missing data are good candidates for deletion if they represent a
small subset of the sample and if their absence do not otherwise
distort the data set. For instance, cases with missing dependent
variable values are often deleted.
If the sample size is small, the analyst may wish to use an imputation
method to fill in missing data. The analyst should, however, consider
the amount of missing data when selecting this option. The degree of
missing data will influence the researchers choice of information used
in the imputation (i.e. complete case vs. all-available approaches)
and the researcher's choice of imputation method (i.e. case
substitution, mean substitution, cold deck imputation, regression
imputation, or multiple imputation).
EVALUATE THE FOLLOWING STATEMENT, "IN ORDER TO RUN MOST
MULTIVARIATE ANALYSES, IT IS NOT NECESSARY TO MEET ALL OF THE
ASSUMPTIONS OF NORMALITY, LINEARITY, HOMOSCEDASTICITY, AND
INDEPENDENCE."
Each multivariate technique has a set of underlying assumptions that
must be met. The degree to which a violation of any of the four above
assumptions will distort data analyses is dependent on the specific
multivariate technique. For example, multiple regression analysis is
sensitive to violations of all four of the assumptions, whereas
multiple discriminant analysis is primarily sensitive to violations of
multivariate normality.
DISCUSS THE FOLLOWING STATEMENT, "MULTIVARIATE ANALYSES CAN
BE RUN ON ANY DATA SET, AS LONG AS THE SAMPLE SIZE IS
ADEQUATE."
False. Although sample size is an important consideration in
multivariate analyses, it is not the only consideration. Analysts must
also consider the degree of missing data present in the data set and
examine the variables for violations of the assumptions of the
intended techniques.
© 2003, Iain Pardoe, Lundquist College of Business,
University of Oregon
Last updated January 2, 2003