Statistica instructions
These instructions accompany Applied Regression Modeling by Iain Pardoe, 2nd edition
published by Wiley in 2012. The numbered items crossreference with the "computer help" references
in the book. These instructions are based on the "Classic Menus" interface of Statistica 10 for Windows, but they (or something similar)
should also work for other versions. Find instructions for other statistical software packages
here.
Getting started and summarizing univariate data
 If desired, change Statistica's default options by selecting
Tools > Options.
 To open a Statistica data file, select File > Open. You can also open Excel, SPSS, JMP, and Minitab files.
 To resume an analysis, which allows you to rerun an analysis with different options selected, rightclick on the appropriate analysis in the tree structure in the lefthand pane of the active Workbook and select Resume Analysis.
 Output can be viewed in the active Workbook. Click Add to Report to add individual pieces of
output (including tables and graphs) to a report, from where they can also be added to a Microsoft Word document.
 You can access help by selecting Help > Statistica Help. For example, to
find out about "boxplots" click the Index tab, type boxplot in the keyword box, click Display, and
select the index entry you want in the main window.
 To transform data or compute a new variable, select the Data window and then select
Data > Variables > Add. Type a name (with no spaces) for the new variable in the
Name box, and type a mathematical expression for the variable in the
Long name box. You can click the Functions button to help you to create the
expression. Examples are =Log(x) for the natural logarithm of x and x**2 for
x^{2}. Click OK to create the new variable, which will be added to the dataset (check it looks correct in the Data window); it can now be used just like any other
variable. If the new variable has blank values this probably means there is a syntax
error in your Function—a common mistake is to forget the multiplication
symbol (*) between a number and a variable (e.g., 2*x represents 2x).
 To create indicator (dummy) variables from a qualitative variable, select the Data window and then select
Data > Variables > Add. Type a name (with no spaces) for the indicator variable in the
Name box, and type an expression like the following in the
Long name box: =iif(x="level", 1, 0), where x is the qualitative
variable and level is the name of one of the categories in x. Click OK and check that the
correct indicator variable has been added to your spreadsheet in the Data window.
Repeat for other indicator variables (if necessary).

 To find a percentile (critical value) for a tdistribution, select
Statistics > Probability Calculator > Distributions. Select t (Student) for the Distribution, check Inverse and 1Cumulative p, type the uppertail area (i.e., the onetail significance level) into the box labeled p, and the degrees of freedom into the box labelled df. Click Compute to see the result in the box labelled t. For example,
typing .05 for p and 29 for df returns the 95th percentile of the tdistribution with 29 degrees of
freedom (1.699), which is the critical value for an uppertail test with a 5% significance level. By
contrast, also checking Twotailed returns the 97.5th percentile of the tdistribution with 29
degrees of freedom (2.045), which is the critical value for a twotail test with a 5% significance
level.
 To find a percentile (critical value) for an Fdistribution, select
Statistics > Probability Calculator > Distributions. Select F (Fisher) for the Distribution, check Inverse and 1Cumulative p, type the uppertail area (i.e., the significance level) into the box labeled p, the numerator degrees of freedom into the box labelled df1, and the denominator degrees of freedom into the box labelled df2. Click Compute to see the result in the box labelled F. For example,
typing .05 for p, 2 for df1, and 3 for df2 returns the 95th percentile of the Fdistribution with 2 numerator
degrees of freedom and 3 denominator degrees of freedom (9.552).
 To find a percentile (critical value) for a chisquared distribution,
Statistics > Probability Calculator > Distributions. Select Chi^{2} for the Distribution, check Inverse and 1Cumulative p, type the uppertail area (i.e., the significance level) into the box labeled p and the degrees of freedom into the box labelled df. Click Compute to see the result in the box labelled Chi^{2}. For example,
typing .05 for p and 2 for df returns the 95th percentile of the chisquared distribution with 2
degrees of freedom (5.991).

 To find an uppertail area (onetail pvalue) for a tdistribution, select
Statistics > Probability Calculator > Distributions. Select t (Student) for the Distribution, check 1Cumulative p, and type the value of the tstatistic into the box labeled t and the degrees of freedom into the box labelled df. Click Compute to see the result in the box labelled p. For example,
typing 2.40 for t and 29 for df returns the uppertail area for a tstatistic of 2.40 from the
tdistribution with 29 degrees of freedom (0.012), which is the pvalue for an uppertail test. By
contrast, also checking Twotailed returns the twotail area for a tstatistic of
2.40 from the tdistribution with 29 degrees of freedom (0.023), which is the pvalue for a twotail test.
 To find an uppertail area (pvalue) for an Fdistribution, select
Statistics > Probability Calculator > Distributions. Select F (Fisher) for the Distribution, check 1Cumulative p, and type the value of the Fstatistic into the box labeled t, the numerator degrees of freedon into the box labelled df1, and the denominator degrees of freedom into the box labelled df2. Click Compute to see the result in the box labelled p. For example,
typing 51.4 for F, 2 for df1, and 3 for df2 returns the uppertail area (pvalue) for an Fstatistic of 51.4 for the Fdistribution with 2
numerator degrees of freedom and 3 denominator degrees of freedom (0.005).
 To find an uppertail area (pvalue) for a chisquared distribution, select Statistics > Probability Calculator > Distributions. Select Chi^{2} for the Distribution, check 1Cumulative p, and type the value of the chisquared statistic into the box labeled Chi^{2} and the degrees of freedom into the box labelled df. Click Compute to see the result in the box labelled p. For example,
typing 0.38 for Chi^{2} and 2 for df returns the uppertail area (pvalue) for a
chisquared statistic of 0.38 for the chisquared distribution with 2 degrees of freedom
(0.827).
 Calculate descriptive statistics for quantitative variables by selecting
Statistics > Basic Statistics and Tables. Leave Descriptive statistics selected under the Quick tab and click OK. Click Variables to select the variable(s) for analysis and click OK. Click the Advanced tab to select the summaries, such as the
Mean, that you would like and click Summary: Statistics to view the results.
 Create contingency tables or crosstabulations for qualitative
variables by selecting Statistics > Basic Statistics and Tables. Select Tables and banners and click OK. Click Specify tables (select variables), move one qualitative
variable into List 1 and another into List 2, and click OK twice. Cell
percentages (within rows, columns, or the whole table) can be calculated by clicking
Options before clicking Summary to display the results.
 If you have a quantitative variable and a qualitative variable, you can calculate
descriptive statistics for cases grouped in different categories by following the instructions in Help #10, but clicking By Group to select the qualitative grouping variable before clicking Summary to display the results. Check Accumulate tabular results in a single spreadsheet in the By Group dialog box to display the results for all the groups in a single table.
 To make a stemandleaf plot for a quantitative variable, select
Statistics > Basic Statistics and Tables. Leave Descriptive statistics selected under the Quick tab and click OK. Click Variables to select the variable(s) for analysis and click OK. Click the Normality tab and click Stem & leaf plot.
 To make a histogram for a quantitative variable, select
Graphs > Histograms. Click Variables to select the variable(s) for analysis and click OK.
 To make a scatterplot with two quantitative variables, select
Graphs > Scatterplots. Click Variables and move the
horizontal axis variable into the
X: box and the vertical axis variable into the Y: box, then click OK twice.
 All possible scatterplots for more than two variables can be drawn simultaneously
(called a scatterplot matrix}) by selecting Graphs > Matrix Plots. Click Variables to select the variable(s) for analysis and click OK twice.
 You can mark or label cases in a scatterplot with different colors/symbols
according to categories in a qualitative variable by followng Help #15, then clicking the Categorized tab in the 2D Scatterplots dialog box, checking On under XCategories, and selecting Overlaid under Layout. Click Change Variable to select the qualitative categorization variable, then click OK twice to display the plot.
 You can identify individual cases in a scatterplot by hovering over individual points.
 To remove one of more observations from a dataset, rightclick the appropriate observation(s) in the Data window and select Selection Conditions > Remove Selected Cases. Removed cases will have a different background color in the spreadsheet. To add removed cases back, rightclick the appropriate observation(s) in the Data window and select Selection Conditions > Add Selected Cases.

 To make a frequency bar chart of one qualitative variable, select Statistics > Basic Statistics and Tables. Select Frequency tables and click OK. Click Variables to select the qualitative
variable and click OK. Then click Histograms to display the bar chart.
 For frequency bar charts of two qualitative variables, select Statistics > Basic Statistics and Tables. Select Tables and banners and click OK. Click Specify tables (select variables) to select the two qualitative
variables and click OK. Then click Categorized histograms to display the bar charts.
 To produce a bar chart of means of a quantitative variable for cases in different categories, select Graphs > 2D Graphs > Means w/Errors Plots. Select Columns for Graph type, then click Variables and move the
quantitative variable into the
Dependent variable: box and the qualitative variable(s) representing the categories into the Grouping variable: box, then click OK twice.
 To make boxplots of a quantitative variable for cases in different categories, select
Graphs > 2D Graphs > Boxplots.
Click Variables and move the
quantitative variable into the
Dependent variable: box and the qualitative variable representing the categories into the Grouping variable: box, then click OK twice.
 To make a QQplot (also known as a normal probability plot) for a
quantitative variable, select Statistics > Basic Statistics and Tables. Leave Descriptive statistics selected under the Quick tab and click OK. Click Variables to select the variable(s) for analysis and click OK. Click the Prob. & Scatterplots tab and click Normal probability plot.
 To compute a confidence interval for a univariate population mean, select
Statistics > Basic Statistics and Tables. Select ttest, single sample and click OK. Click Variables to select the quantitative
variable for analysis and click OK. Click the
Options tab and check Compute conf. limits before clicking Summary to display the results.
 To do a hypothesis test for a univariate population mean, select
Statistics > Basic Statistics and Tables. Select ttest, single sample and click OK. Click Variables to select the quantitative
variable for analysis and click OK. Type the (null) hypothesised value into the Test all means against: box before clicking Summary to display the results. The pvalue calculated is a twotailed pvalue; to obtain a onetailed pvalue you will either need to divide this value by two or subtract it from one and then divide by two (draw a picture to figure out which).
Simple linear regression
 To fit a simple linear regression model (i.e., find a least squares line),
select Statistics > Multiple Regression. Click the Variable button and move the response variable into the
Dependent var. box and the predictor variable into the Independent variable list box. Click OK twice to see the basic results, from where you can select further results to display. In the rare circumstance that you
wish to fit a model without an intercept term (regression through the origin), select the Advanced tab in the Multiple Linear Regression dialog box, check Advanced options (stepwise or ridge regression), and click OK. Then select the Advanced tab in the Model Definition dialog box and change the Intercept setting to Set to zero before clicking OK.
 To add a regression line or least squares line to a scatterplot,
follow Help #15, which includes a regression line in the plot by default.
 Statistica does not appear to offer an automatic way to find 95% confidence intervals for the regression parameters in a simple
linear regression model using the Multiple Regression routine (although it is possible to do this using Statistica's General Linear Models routine). It is possible to calculate these intervals by hand using Statistica regression output and appropriate percentiles from a tdistribution. This applies more generally
to multiple linear regression also.

To find a fitted value or predicted value of Y (the response
variable) at a particular value of X (the predictor variable), follow Help #25 (or #31) to fit a linear regression model, then click the Residuals/assumptions/prediction tab. Click Predict dependent variable to specify the value(s) for the predictor term(s) and click OK. The fitted or predicted
value of Y at the Xvalue(s) that you specified is displayed in the row of the results labeled "Predicted." This applies more generally to multiple linear regression also.
 To find a confidence interval for the mean of Y at a particular value of
X, follow Help #25 (or #31) to fit a linear regression model, then click the Residuals/assumptions/prediction tab. Select Compute confidence limits, specify the significance level, Alpha (the default is 5% for a 95% interval), click Predict dependent variable to specify the value(s) for the predictor term(s), and click OK. The confidence interval for the mean of Y at the Xvalue(s) that you specified is displayed in the rows of the results labeled, for example, "95.0%CL" and"+95.0%CL." This applies more generally to multiple linear regression also.
 To find a prediction interval for an individual value of
Y at a particular value of X, follow Help #25 (or #31) to fit a linear
regression model, then click the Residuals/assumptions/prediction tab.
Select Compute prediction limits, specify the significance level,
Alpha (the default is 5% for a 95% interval), click
Predict dependent variable to specify the value(s) for the predictor
term(s), and click OK. The prediction interval for an individual
Yvalue at the Xvalue(s) that you specified is displayed in the rows of the
results labeled, for example, "95.0%PL" and"+95.0%PL." This applies
more generally to multiple linear regression also.
Multiple linear regression
 To fit a multiple linear regression model, select
select Statistics > Multiple Regression. Click the Variable button and move the response variable into the
Dependent var. box and the predictor variables into the Independent variable list box. Click OK twice to see the basic results, from where you can select further results to display. In the rare circumstance that you
wish to fit a model without an intercept term (regression through the origin), select the Advanced tab in the Multiple Linear Regression dialog box, check Advanced options (stepwise or ridge regression), and click OK. Then select the Advanced tab in the Model Definition dialog box and change the Intercept setting to Set to zero before clicking OK.
 To add a quadratic regression line to a scatterplot, follow Help #15, but before clicking OK select the Advanced tab in the 2D Scatterplots dialog box and select Polynomial for Fit.
 Categories of a qualitative variable can be thought of as defining subsets
of the sample. If there is also a quantitative response and a quantitative predictor variable in
the dataset, a regression model can be fit to the data to represent separate regression lines for
each subset. To do this follow Help #17, which includes separate regression lines in the plot by default.
 Statistica does not appear to offer an automatic way to to find the Fstatistic and associated pvalue for a nested model Ftest in multiple linear regression. It is possible to calculate these quantities by hand using Statistica regression output and appropriate percentiles from a Fdistribution.
 To save residuals in a multiple linear regression model, follow Help #31 to fit a multiple linear regression model, then click the Residuals/assumptions/prediction tab. Click Perform residual analysis, click the Save tab in the Residual Analysis dialog box, and click Save residuals & predicted. The residuals will be saved as a
variable called Residuals in a new spreadsheet; they can now be used just like any other variable, for example, to construct residual plots. Note that Statistica will also save what it calls StandardResidual, but these are different to what Pardoe (2012) calls standardized residuals. Similarly, Statistica's DeletedResidual is different to what Pardoe (2012) calls
studentized residuals.
 To add a loess fitted line to a scatterplot (useful for checking the zero
mean regression assumption in a residual plot), follow Help #15, but before clicking OK select the Advanced tab in the 2D Scatterplots dialog box and select Lowess for Fit.
 Statistica does not appear to offer an automatic way to save leverages in a multiple linear regression model.
 To save Cook's distances in a multiple linear regression model, follow Help #31 to fit a multiple linear regression model, then click the Residuals/assumptions/prediction tab. Click Perform residual analysis, click the Save tab in the Residual Analysis dialog box, and click Save residuals & predicted. The Cook's distances will be saved as a
variable called CookDistance in a new spreadsheet; they can now be used just like
any other variable, for example, to construct scatterplots.
 To create some residual plots automatically in a multiple linear regression
model, follow Help #31 to fit a multiple linear regression model, click the Residuals/assumptions/prediction tab, and click Perform residual analysis. Try any of the following:
 Select the Residuals tab and select Residuals vs. independent var..
 Select the Scatterplots tab and select Predicted vs. residuals.
 Select the Residuals tab and select Histogram of residuals.
 Select the Probability plots tab and select Normal plot of residuals.
To create residual plots manually, first create studentized residuals (see help #35),
and then construct scatterplots with these studentized residuals on the vertical axis.
 To create a correlation matrix of quantitative variables (useful for
checking potential multicollinearity problems), select
Statistics > Basic Statistics and Tables. Select Correlation matrices and click OK. Click One variable list to select the variable(s) for analysis and click OK. Click Summary to view the results.
 To find variance inflation factors in multiple linear regression, follow Help #31 to fit a multiple linear regression model, click the Advanced tab, and click Current sweep matrix. The variance
inflation factors are the negatives of the diagonal elements for the predictor terms in this matrix.
 To draw a predictor effect plot for graphically displaying the effects of
transformed quantitative predictors and/or interactions between quantitative and qualitative
predictors in multiple linear regression, first create a variable representing the effect, say,
"X1effect" (see computer help #6). Then select
Graphs > Scatterplots. Click Variables and move X1 into the
X: box and the "X1effect" variable into the Y: box.
 If the "X1effect" variable just involves X1 (e.g., 1 + 3X1 + 4X1^{2}),
you can click OK twice at this point.
 If the "X1effect" variable also involves a qualitative variable (e.g.,
1 − 2X1 + 3D2X1, where D2 is an indicator variable), you should click the Categorized tab in the 2D Scatterplots dialog box, check On under XCategories, and select Overlaid under Layout. Click Change Variable to select the qualitative categorization variable, then click OK twice to display the plot.
See Section 5.5 in Pardoe (2012) for an example. The instructions here create scatterplots rather than line plots, but lines can be added to the plots with an appropriate choice of the Fit type on the Advanced tab in the 2D Scatterplots dialog box.
Last updated: June, 2012
© 2012, Iain Pardoe