Statistica instructions

These instructions accompany Applied Regression Modeling by Iain Pardoe, 2nd edition published by Wiley in 2012. The numbered items cross-reference with the "computer help" references in the book. These instructions are based on the "Classic Menus" interface of Statistica 10 for Windows, but they (or something similar) should also work for other versions. Find instructions for other statistical software packages here.

Getting started and summarizing univariate data

  1. If desired, change Statistica's default options by selecting Tools > Options.
  2. To open a Statistica data file, select File > Open. You can also open Excel, SPSS, JMP, and Minitab files.
  3. To resume an analysis, which allows you to re-run an analysis with different options selected, right-click on the appropriate analysis in the tree structure in the left-hand pane of the active Workbook and select Resume Analysis.
  4. Output can be viewed in the active Workbook. Click Add to Report to add individual pieces of output (including tables and graphs) to a report, from where they can also be added to a Microsoft Word document.
  5. You can access help by selecting Help > Statistica Help. For example, to find out about "boxplots" click the Index tab, type boxplot in the keyword box, click Display, and select the index entry you want in the main window.
  6. To transform data or compute a new variable, select the Data window and then select Data > Variables > Add. Type a name (with no spaces) for the new variable in the Name box, and type a mathematical expression for the variable in the Long name box. You can click the Functions button to help you to create the expression. Examples are =Log(x) for the natural logarithm of x and x**2 for x2. Click OK to create the new variable, which will be added to the dataset (check it looks correct in the Data window); it can now be used just like any other variable. If the new variable has blank values this probably means there is a syntax error in your Function—a common mistake is to forget the multiplication symbol (*) between a number and a variable (e.g., 2*x represents 2x).
  7. To create indicator (dummy) variables from a qualitative variable, select the Data window and then select Data > Variables > Add. Type a name (with no spaces) for the indicator variable in the Name box, and type an expression like the following in the Long name box: =iif(x="level", 1, 0), where x is the qualitative variable and level is the name of one of the categories in x. Click OK and check that the correct indicator variable has been added to your spreadsheet in the Data window. Repeat for other indicator variables (if necessary).
  8. Calculate descriptive statistics for quantitative variables by selecting Statistics > Basic Statistics and Tables. Leave Descriptive statistics selected under the Quick tab and click OK. Click Variables to select the variable(s) for analysis and click OK. Click the Advanced tab to select the summaries, such as the Mean, that you would like and click Summary: Statistics to view the results.
  9. Create contingency tables or cross-tabulations for qualitative variables by selecting Statistics > Basic Statistics and Tables. Select Tables and banners and click OK. Click Specify tables (select variables), move one qualitative variable into List 1 and another into List 2, and click OK twice. Cell percentages (within rows, columns, or the whole table) can be calculated by clicking Options before clicking Summary to display the results.
  10. If you have a quantitative variable and a qualitative variable, you can calculate descriptive statistics for cases grouped in different categories by following the instructions in Help #10, but clicking By Group to select the qualitative grouping variable before clicking Summary to display the results. Check Accumulate tabular results in a single spreadsheet in the By Group dialog box to display the results for all the groups in a single table.
  11. To make a stem-and-leaf plot for a quantitative variable, select Statistics > Basic Statistics and Tables. Leave Descriptive statistics selected under the Quick tab and click OK. Click Variables to select the variable(s) for analysis and click OK. Click the Normality tab and click Stem & leaf plot.
  12. To make a histogram for a quantitative variable, select Graphs > Histograms. Click Variables to select the variable(s) for analysis and click OK.
  13. To make a scatterplot with two quantitative variables, select Graphs > Scatterplots. Click Variables and move the horizontal axis variable into the X: box and the vertical axis variable into the Y: box, then click OK twice.
  14. All possible scatterplots for more than two variables can be drawn simultaneously (called a scatterplot matrix}) by selecting Graphs > Matrix Plots. Click Variables to select the variable(s) for analysis and click OK twice.
  15. You can mark or label cases in a scatterplot with different colors/symbols according to categories in a qualitative variable by followng Help #15, then clicking the Categorized tab in the 2D Scatterplots dialog box, checking On under X-Categories, and selecting Overlaid under Layout. Click Change Variable to select the qualitative categorization variable, then click OK twice to display the plot.
  16. You can identify individual cases in a scatterplot by hovering over individual points.
  17. To remove one of more observations from a dataset, right-click the appropriate observation(s) in the Data window and select Selection Conditions > Remove Selected Cases. Removed cases will have a different background color in the spreadsheet. To add removed cases back, right-click the appropriate observation(s) in the Data window and select Selection Conditions > Add Selected Cases.
  18. To make boxplots of a quantitative variable for cases in different categories, select Graphs > 2D Graphs > Boxplots. Click Variables and move the quantitative variable into the Dependent variable: box and the qualitative variable representing the categories into the Grouping variable: box, then click OK twice.
  19. To make a QQ-plot (also known as a normal probability plot) for a quantitative variable, select Statistics > Basic Statistics and Tables. Leave Descriptive statistics selected under the Quick tab and click OK. Click Variables to select the variable(s) for analysis and click OK. Click the Prob. & Scatterplots tab and click Normal probability plot.
  20. To compute a confidence interval for a univariate population mean, select Statistics > Basic Statistics and Tables. Select t-test, single sample and click OK. Click Variables to select the quantitative variable for analysis and click OK. Click the Options tab and check Compute conf. limits before clicking Summary to display the results.
  21. To do a hypothesis test for a univariate population mean, select Statistics > Basic Statistics and Tables. Select t-test, single sample and click OK. Click Variables to select the quantitative variable for analysis and click OK. Type the (null) hypothesised value into the Test all means against: box before clicking Summary to display the results. The p-value calculated is a two-tailed p-value; to obtain a one-tailed p-value you will either need to divide this value by two or subtract it from one and then divide by two (draw a picture to figure out which).

Simple linear regression

  1. To fit a simple linear regression model (i.e., find a least squares line), select Statistics > Multiple Regression. Click the Variable button and move the response variable into the Dependent var. box and the predictor variable into the Independent variable list box. Click OK twice to see the basic results, from where you can select further results to display. In the rare circumstance that you wish to fit a model without an intercept term (regression through the origin), select the Advanced tab in the Multiple Linear Regression dialog box, check Advanced options (stepwise or ridge regression), and click OK. Then select the Advanced tab in the Model Definition dialog box and change the Intercept setting to Set to zero before clicking OK.
  2. To add a regression line or least squares line to a scatterplot, follow Help #15, which includes a regression line in the plot by default.
  3. Statistica does not appear to offer an automatic way to find 95% confidence intervals for the regression parameters in a simple linear regression model using the Multiple Regression routine (although it is possible to do this using Statistica's General Linear Models routine). It is possible to calculate these intervals by hand using Statistica regression output and appropriate percentiles from a t-distribution. This applies more generally to multiple linear regression also.
  4. To find a fitted value or predicted value of Y (the response variable) at a particular value of X (the predictor variable), follow Help #25 (or #31) to fit a linear regression model, then click the Residuals/assumptions/prediction tab. Click Predict dependent variable to specify the value(s) for the predictor term(s) and click OK. The fitted or predicted value of Y at the X-value(s) that you specified is displayed in the row of the results labeled "Predicted." This applies more generally to multiple linear regression also.
  5. To find a confidence interval for the mean of Y at a particular value of X, follow Help #25 (or #31) to fit a linear regression model, then click the Residuals/assumptions/prediction tab. Select Compute confidence limits, specify the significance level, Alpha (the default is 5% for a 95% interval), click Predict dependent variable to specify the value(s) for the predictor term(s), and click OK. The confidence interval for the mean of Y at the X-value(s) that you specified is displayed in the rows of the results labeled, for example, "-95.0%CL" and"+95.0%CL." This applies more generally to multiple linear regression also.
  6. To find a prediction interval for an individual value of Y at a particular value of X, follow Help #25 (or #31) to fit a linear regression model, then click the Residuals/assumptions/prediction tab. Select Compute prediction limits, specify the significance level, Alpha (the default is 5% for a 95% interval), click Predict dependent variable to specify the value(s) for the predictor term(s), and click OK. The prediction interval for an individual Y-value at the X-value(s) that you specified is displayed in the rows of the results labeled, for example, "-95.0%PL" and"+95.0%PL." This applies more generally to multiple linear regression also.

Multiple linear regression

  1. To fit a multiple linear regression model, select select Statistics > Multiple Regression. Click the Variable button and move the response variable into the Dependent var. box and the predictor variables into the Independent variable list box. Click OK twice to see the basic results, from where you can select further results to display. In the rare circumstance that you wish to fit a model without an intercept term (regression through the origin), select the Advanced tab in the Multiple Linear Regression dialog box, check Advanced options (stepwise or ridge regression), and click OK. Then select the Advanced tab in the Model Definition dialog box and change the Intercept setting to Set to zero before clicking OK.
  2. To add a quadratic regression line to a scatterplot, follow Help #15, but before clicking OK select the Advanced tab in the 2D Scatterplots dialog box and select Polynomial for Fit.
  3. Categories of a qualitative variable can be thought of as defining subsets of the sample. If there is also a quantitative response and a quantitative predictor variable in the dataset, a regression model can be fit to the data to represent separate regression lines for each subset. To do this follow Help #17, which includes separate regression lines in the plot by default.
  4. Statistica does not appear to offer an automatic way to to find the F-statistic and associated p-value for a nested model F-test in multiple linear regression. It is possible to calculate these quantities by hand using Statistica regression output and appropriate percentiles from a F-distribution.
  5. To save residuals in a multiple linear regression model, follow Help #31 to fit a multiple linear regression model, then click the Residuals/assumptions/prediction tab. Click Perform residual analysis, click the Save tab in the Residual Analysis dialog box, and click Save residuals & predicted. The residuals will be saved as a variable called Residuals in a new spreadsheet; they can now be used just like any other variable, for example, to construct residual plots. Note that Statistica will also save what it calls StandardResidual, but these are different to what Pardoe (2012) calls standardized residuals. Similarly, Statistica's DeletedResidual is different to what Pardoe (2012) calls studentized residuals.
  6. To add a loess fitted line to a scatterplot (useful for checking the zero mean regression assumption in a residual plot), follow Help #15, but before clicking OK select the Advanced tab in the 2D Scatterplots dialog box and select Lowess for Fit.
  7. Statistica does not appear to offer an automatic way to save leverages in a multiple linear regression model.
  8. To save Cook's distances in a multiple linear regression model, follow Help #31 to fit a multiple linear regression model, then click the Residuals/assumptions/prediction tab. Click Perform residual analysis, click the Save tab in the Residual Analysis dialog box, and click Save residuals & predicted. The Cook's distances will be saved as a variable called CookDistance in a new spreadsheet; they can now be used just like any other variable, for example, to construct scatterplots.
  9. To create some residual plots automatically in a multiple linear regression model, follow Help #31 to fit a multiple linear regression model, click the Residuals/assumptions/prediction tab, and click Perform residual analysis. Try any of the following: To create residual plots manually, first create studentized residuals (see help #35), and then construct scatterplots with these studentized residuals on the vertical axis.
  10. To create a correlation matrix of quantitative variables (useful for checking potential multicollinearity problems), select Statistics > Basic Statistics and Tables. Select Correlation matrices and click OK. Click One variable list to select the variable(s) for analysis and click OK. Click Summary to view the results.
  11. To find variance inflation factors in multiple linear regression, follow Help #31 to fit a multiple linear regression model, click the Advanced tab, and click Current sweep matrix. The variance inflation factors are the negatives of the diagonal elements for the predictor terms in this matrix.
  12. To draw a predictor effect plot for graphically displaying the effects of transformed quantitative predictors and/or interactions between quantitative and qualitative predictors in multiple linear regression, first create a variable representing the effect, say, "X1effect" (see computer help #6). Then select Graphs > Scatterplots. Click Variables and move X1 into the X: box and the "X1effect" variable into the Y: box. See Section 5.5 in Pardoe (2012) for an example. The instructions here create scatterplots rather than line plots, but lines can be added to the plots with an appropriate choice of the Fit type on the Advanced tab in the 2D Scatterplots dialog box.

Last updated: June, 2012

© 2012, Iain Pardoe