DSC 410/510 - Multivariate Statistical Methods

Chapter 5

Suggested Solutions

  1. WHEN WOULD YOU EMPLOY LOGISTIC REGRESSION RATHER THAN DISCRIMINANT ANALYSIS? WHAT ARE THE ADVANTAGES AND DISADVANTAGES OF THE DECISION?

    Both discriminant analysis and logistic regression are appropriate when the dependent variable is categorical and the independent variables are metric. In the case of a two group dependent variable either technique might be applied, but only discriminant analysis is capable of handling more than two groups. When the basic assumptions of both methods are met, each gives comparable predictive and classification results and employ similar diagnostic measures. Logistic regression has the advantage of being less affected than discriminant analysis when the basic assumptions of normality and equal variance are not met. It also can accommodate non-metric dummy coded variables as independent measures. Logistic regression is limited though to the prediction of only a two-group dependent measure. Thus, when more than two groups are involved discriminant analysis is required. (Although note that there is a form of logistic regression called multinomial logistic that can handle more than two groups.)

  2. HOW WOULD YOU DETERMINE WHETHER OR NOT THE CLASSIFICATION ACCURACY OF THE DISCRIMINANT FUNCTION IS SUFFICIENTLY HIGH RELATIVE TO CHANCE CLASSIFICATION?

    Some chance criterion must be established. This is usually a fairly direct function of the classifications used in the model and of the sample size. The authors then suggest the following criterion: the classification accuracy (hit ratio) should be at least 25 percent greater than by chance.
    Another test would be to use a test of proportions to examine for significance between the chance criterion proportion and the obtained hit-ratio proportion.

  3. HOW DO LOGISTIC REGRESSION AND DISCRIMINANT ANALYSES EACH HANDLE THE RELATIONSHIP OF THE DEPENDENT AND INDEPENDENT VARIABLES?

    Discriminant analysis derives a variate, the linear combination of two or more independent variables that will discriminate best between the dependent variable groups. Discrimination is achieved by setting variate weights for each variable to maximize between group variance. A discriminant (z) score is then calculated for each observation. Group means (centroids) are calculated and a test of discrimination is the distance between group centroids.
    Logistic regression forms a single variate more similar to multiple regression. It differs from multiple regression in that it directly predicts the probability of an event occurring. To define the probability, logistic regression assumes the relationship between the independent and dependent variables resembles an S-shaped curve. At very low levels of the variate, the probability approaches zero. As the variate increases, the probability increases. Logistic regression uses a maximum likelihood procedure to fit the observed data to the curve.

  4. WHAT ARE THE DIFFERENCES IN ESTIMATION AND INTERPRETATION BETWEEN LOGISTIC REGRESSION AND DISCRIMINANT ANALYSIS?

    Estimation of the discriminant variate is based on maximizing between group variance. Logistic regression is estimated using a maximum likelihood technique to fit the data to a logistic curve. Both techniques produce a variate that gives information about which variables explain the dependent variable or group membership. Logistic regression may be comfortable for many to interpret in that it resembles the more commonly seen regression analysis.

© 2003, Iain Pardoe, Lundquist College of Business, University of Oregon
Last updated September 26, 2003