DSC 410/510 - Multivariate Statistical Methods

Chapter 2

Suggested Solutions

  1. LIST POTENTIAL UNDERLYING CAUSES OF OUTLIERS. BE SURE TO INCLUDE ATTRIBUTIONS TO BOTH THE RESPONDENT AND THE RESEARCHER.

    1. Respondent:
      1. Misunderstanding of the question
      2. Response bias, such as yea-saying
      3. Extraordinary experience
    2. Researcher:
      1. Data entry errors
      2. Data coding mistakes
    3. An extraordinary observation with no explanation.
    4. An ordinary value that is unique when combined with other variables.

  2. DESCRIBE THE CONDITIONS UNDER WHICH A RESEARCHER WOULD DELETE A CASE WITH MISSING DATA VERSUS THE CONDITIONS UNDER WHICH A RESEARCHER WOULD USE AN IMPUTATION METHOD.

    The researcher must first evaluate the randomness of the missing data process. If the missing data process is non-ignorable, consult an expert! If the data are missing at random, deleting a case is the only acceptable alternative of the two. Data that are missing at random cannot employ an imputation method, as it would introduce bias into the results. Only cases with data missing completely at random would utilize an imputation method.
    If the data are missing completely at random, the choice of case deletion versus imputation method should be based on theoretical and empirical considerations.
    If the sample size is sufficiently large, the analyst may wish to consider deletion of cases with a great degree of missing data. Cases with missing data are good candidates for deletion if they represent a small subset of the sample and if their absence do not otherwise distort the data set. For instance, cases with missing dependent variable values are often deleted.
    If the sample size is small, the analyst may wish to use an imputation method to fill in missing data. The analyst should, however, consider the amount of missing data when selecting this option. The degree of missing data will influence the researchers choice of information used in the imputation (i.e. complete case vs. all-available approaches) and the researcher's choice of imputation method (i.e. case substitution, mean substitution, cold deck imputation, regression imputation, or multiple imputation).

  3. EVALUATE THE FOLLOWING STATEMENT, "IN ORDER TO RUN MOST MULTIVARIATE ANALYSES, IT IS NOT NECESSARY TO MEET ALL OF THE ASSUMPTIONS OF NORMALITY, LINEARITY, HOMOSCEDASTICITY, AND INDEPENDENCE."

    Each multivariate technique has a set of underlying assumptions that must be met. The degree to which a violation of any of the four above assumptions will distort data analyses is dependent on the specific multivariate technique. For example, multiple regression analysis is sensitive to violations of all four of the assumptions, whereas multiple discriminant analysis is primarily sensitive to violations of multivariate normality.

  4. DISCUSS THE FOLLOWING STATEMENT, "MULTIVARIATE ANALYSES CAN BE RUN ON ANY DATA SET, AS LONG AS THE SAMPLE SIZE IS ADEQUATE."

    False. Although sample size is an important consideration in multivariate analyses, it is not the only consideration. Analysts must also consider the degree of missing data present in the data set and examine the variables for violations of the assumptions of the intended techniques.

© 2003, Iain Pardoe, Lundquist College of Business, University of Oregon
Last updated January 2, 2003