DSC 410/510 - Multivariate Statistical Methods
Chapter 5
Suggested Solutions
WHEN WOULD YOU EMPLOY LOGISTIC REGRESSION RATHER
THAN DISCRIMINANT ANALYSIS? WHAT ARE THE ADVANTAGES AND DISADVANTAGES
OF THE DECISION?
Both discriminant analysis and logistic regression are appropriate
when the dependent variable is categorical and the independent
variables are metric. In the case of a two group dependent variable
either technique might be applied, but only discriminant analysis is
capable of handling more than two groups. When the basic assumptions
of both methods are met, each gives comparable predictive and
classification results and employ similar diagnostic
measures. Logistic regression has the advantage of being less affected
than discriminant analysis when the basic assumptions of normality and
equal variance are not met. It also can accommodate non-metric dummy
coded variables as independent measures. Logistic regression is
limited though to the prediction of only a two-group dependent
measure. Thus, when more than two groups are involved discriminant
analysis is required. (Although note that there is a form of logistic
regression called multinomial logistic that can handle more than two
groups.)
HOW WOULD YOU DETERMINE WHETHER OR NOT THE
CLASSIFICATION ACCURACY OF THE DISCRIMINANT FUNCTION IS SUFFICIENTLY
HIGH RELATIVE TO CHANCE CLASSIFICATION?
Some chance criterion must be established. This is usually a fairly
direct function of the classifications used in the model and of the
sample size. The authors then suggest the following criterion: the
classification accuracy (hit ratio) should be at least 25 percent
greater than by chance.
Another test would be to use a test of proportions to examine for
significance between the chance criterion proportion and the obtained
hit-ratio proportion.
HOW DO LOGISTIC REGRESSION AND DISCRIMINANT ANALYSES
EACH HANDLE THE RELATIONSHIP OF THE DEPENDENT AND INDEPENDENT
VARIABLES?
Discriminant analysis derives a variate, the linear combination of two
or more independent variables that will discriminate best between the
dependent variable groups. Discrimination is achieved by setting
variate weights for each variable to maximize between group
variance. A discriminant (z) score is then calculated for each
observation. Group means (centroids) are calculated and a test of
discrimination is the distance between group centroids.
Logistic regression forms a single variate more similar to multiple
regression. It differs from multiple regression in that it directly
predicts the probability of an event occurring. To define the
probability, logistic regression assumes the relationship between the
independent and dependent variables resembles an S-shaped curve. At
very low levels of the variate, the probability approaches zero. As
the variate increases, the probability increases. Logistic regression
uses a maximum likelihood procedure to fit the observed data to the
curve.
WHAT ARE THE DIFFERENCES IN ESTIMATION AND INTERPRETATION
BETWEEN LOGISTIC REGRESSION AND DISCRIMINANT ANALYSIS?
Estimation of the discriminant variate is based on maximizing between
group variance. Logistic regression is estimated using a maximum
likelihood technique to fit the data to a logistic curve. Both
techniques produce a variate that gives information about which
variables explain the dependent variable or group membership. Logistic
regression may be comfortable for many to interpret in that it
resembles the more commonly seen regression analysis.
© 2003, Iain Pardoe, Lundquist College of Business,
University of Oregon
Last updated September 26, 2003