JMP instructions
These instructions accompany Applied Regression Modeling by Iain Pardoe, 2nd edition
published by Wiley in 2012. The numbered items crossreference with the "computer help" references
in the book. These instructions are based on SAS JMP 10 for Mac OS, but they (or something similar)
should also work for other versions. Find instructions for other statistical software packages
here.
Getting started and summarizing univariate data
 If desired, change JMP's default options by selecting
JMP > Preferences (Mac) or File > Preferences (Windows).
 To open a JMP data file, select File > Open. You can also use
File > Open to open text data files or Excel spreadsheets. For Excel spreadsheets, check
the box labeled Always enforce Excel Row 1 as labels if the spreadsheet has the variable
labels in the first row.
 To relaunch analysis or recall dialog after running an analysis,
click the red triangle next to the analysis name at the top of the output window, and select
Script > Relaunch Analysis or Model dialog.
 Output appears in a separate window each time you run an analysis. If you
click on the "selection tool" (the third button from the left at the top of the window that looks
like a "+"), you can select the output by clicking on it, and then rightclick to Copy so
that you can then paste it to a word processor like OpenOffice Writer or Microsoft Word.
 You can access help by selecting Help > Statistics Index, then
selecting the topic that you would like help with. There is also a Help button in each
analysis dialog box.
 To transform data or compute a new variable, select
Cols > New Column, type the new variable name in the Column Name box, and select
Formula under Column Properties. In the resulting dialog box, select the variable
to be transformed under Table Columns and build the formula using the various operations
and functions. Examples are Transcendental > Log for the natural logarithm and
x^{y} for powers such as 2 ("squared"). The new variable should appear in the data
spreadsheet (check that it looks correct) and can now be used just like any other variable.
 To create indicator (dummy) variables from a qualitative variable, select
the qualitative variable and select Cols > Recode. Type the values 0 and 1 under
New Value for the appropriate categories and change\linebreak In Place to
New Column. Check that the correct indicator variable has been created in the spreadsheet.
Change the name and data/modeling type of the created variable by doubleclicking the column heading
(Data Type should be Numeric rather than Character and
Modeling Type should be Continuous rather than Nominal). Repeat for
other indicator variables (if necessary).

 To find a percentile (critical value) for a tdistribution, select
View > Log (Windows) or Window > Log (Mac), type and highlight
t Quantile(p, df), then click Run Script. Here p is the lowertail area
(i.e., one minus the onetail significance level) and df is the degrees of freedom. For
example, t Quantile(0.95, 29) returns the 95th percentile of the tdistribution with 29
degrees of freedom (1.699), which is the critical value for an uppertail test with a 5%
significance level. By contrast, t Quantile(0.975, 29) returns the 97.5th percentile of the
tdistribution with 29 degrees of freedom (2.045), which is the critical value for a twotail test
with a 5% significance level.
 To find a percentile (critical value) for an Fdistribution, select
View > Log (Windows) or Window > Log (Mac), type and highlight
F Quantile(p, df1, df2), then click Run Script. Here p is the lowertail
area (i.e., one minus the significance level), df1 is the numerator degrees of freedom, and
df2 is the denominator degrees of freedom. For example, F Quantile(0.95, 2, 3)
returns the 95th percentile of the Fdistribution with 2 numerator degrees of freedom and 3
denominator degrees of freedom (9.552).
 To find a percentile (critical value) for a chisquared distribution,
select View > Log (Windows) or Window > Log (Mac), type and highlight
ChiSquare Quantile(p, df), then click Run Script. Here p is the
lowertail area (i.e., one minus the significance level) and df is the degrees of freedom.
For example, ChiSquare Quantile(0.95, 2) returns the 95th percentile of the chisquared
distribution with 2 degrees of freedom (5.991).

 To find an uppertail area (onetail pvalue) for a tdistribution, select
View > Log (Windows) or Window > Log (Mac), type and highlight
1  t Distribution(t, df), then click Run Script. Here t is the absolute
value of the tstatistic and df is the degrees of freedom. For example,
1  t Distribution(2.40, 29) returns the uppertail area for a tstatistic of 2.40 from the
tdistribution with 29 degrees of freedom (0.012), which is the pvalue for an uppertail test. By
contrast, 2*(1  t Distribution(2.40, 29)) returns the twotail area for a tstatistic of
2.40 from the tdistribution with 29 degrees of freedom (0.023), which is the pvalue for a twotail test.
 To find an uppertail area (pvalue) for an Fdistribution, select
View > Log (Windows) or Window > Log (Mac), type and highlight
1  F Distribution(f, df1, df2), then click Run Script. Here f is the
value of the Fstatistic, df1 is the numerator degrees of freedom, and df2 is the
denominator degrees of freedom. For example, 1  F Distribution(51.4, 2, 3) returns the
uppertail area (pvalue) for an Fstatistic of 51.4 for the Fdistribution with 2 numerator degrees
of freedom and 3 denominator degrees of freedom (0.005).
 To find an uppertail area (pvalue) for a chisquared distribution, select
View > Log (Windows) or Window > Log (Mac), type and highlight
1  ChiSquare Distribution(chisq, df), then click Run Script. Here chisq
is the value of the chisquared statistic and df is the degrees of freedom. For example,
1  ChiSquare Distribution(0.38, 2) returns the uppertail area (pvalue) for a chisquared
statistic of 0.38 for the chisquared distribution with 2 degrees of freedom (0.827).
 Calculate descriptive statistics for quantitative variables by selecting
Analyze > Distribution. Move the variable(s) into the Y, Columns list and click
OK. In the resulting output window, you can select additional output by clicking on the
red triangle next to each variable name.
 Create contingency tables or crosstabulations for qualitative
variables by selecting Analyze > Fit Y by X. Move one qualitative variable into the
Y, Response list and another into the X, Factor list. Cell percentages
(within rows, columns, or the whole table) are displayed automatically in the resulting table.
 If you have quantitative variables and qualitative variables, you can calculate
descriptive statistics for cases grouped in different categories by selecting
Tables > Summary. Select the quantitative variable(s) and then select the summaries that
you would like from the Statistics menu. Move the qualitative variable(s) into the
Group list.
 To make a stemandleaf plot for a quantitative variable, select
Analyze > Distribution. Move the variable(s) into the Y, Columns list and click
OK. In the resulting output window, you can select Stem and Leaf by clicking on
the red triangle next to each variable name.
 To make a histogram for a quantitative variable, select
Analyze > Distribution. Move the variable(s) into the Y, Columns list and click
OK. In the resulting output window, you can select various Histogram Options by
clicking on the red triangle next to each variable name.
 To make a scatterplot with two quantitative variables, select
Analyze > Fit Y by X. Move the vertical axis variable into the Y, Response box
and the horizontal axis variable into the X, Factor box.
 All possible scatterplots for more than two variables can be drawn simultaneously
(called a scatterplot matrix) by selecting Graph > Scatterplot Matrix. Move all the
variables into the Y, Columns box.
 You can mark or label cases in a scatterplot with different colors/symbols
according to categories in a qualitative variable by selecting
Rows > Color or Mark by Column... before drawing the plot. Select the column containing
the variable you wish to mark by.
 You can identify individual cases in a scatterplot by hovering over
individual points in the scatterplot. If you doubleclick a point, the corresponding row in the
spreadsheet will be highlighted.
 To remove one of more observations from a dataset, rightclick on the row
number(s) in the data spreadsheet and select Exclude/Unexclude.
 To make a bar chart for cases in different categories, select
Graph > Chart.
 For frequency bar charts of one or two qualitative variables, move the
variable(s) into the Categories, X, Levels box.
 The bars can also represent various summary functions for a quantitative variable.
For example, to represent group means, select the quantitative variable and then select
Mean from the Statistics menu.
 To make boxplots for cases in different categories, select
Analyze > Fit Y by X.
 Move the quantitative variable into the Y, Response box and the
qualitative variable into the X, Factor box. In the resulting Oneway Analysis
output window, click on the red triangle and select Quantiles.
 To create clustered boxplots for two qualitative variables, first create a new
qualitative variable consisting of all category combinations (using computer help #6 and the
Character > Concat function). Then use this new variable as the X, Factor
variable.
 To make a QQplot (also known as a normal probability plot) for a
quantitative variable, select Analyze > Distribution. Move the variable into the
Y, Columns list and click OK. In the resulting output window, you can select
Normal Quantile Plot by clicking on the red triangle next to the variable name.
 To compute a confidence interval for a univariate population mean, select
Analyze > Distribution. Move the variable into the Y, Columns list and click
OK. In the resulting output window, you can select Confidence Interval by
clicking on the red triangle next to the variable name. Enter the confidence level in the resulting
Confidence Intervals dialog box and click OK.
 To do a hypothesis test for a univariate population mean, select
Analyze > Distribution. Move the variable into the Y, Columns list and click
OK. In the resulting output window, you can select Test Mean by clicking on the
red triangle next to the variable name. Enter the (null) hypothesized mean in the resulting
Test Mean dialog box and click OK.
Simple linear regression
 To fit a simple linear regression model (i.e., find a least squares line),
select Analyze > Fit Model. Move the response variable into the Y box, select
the predictor variable and Add it to the Construct Model Effects box, and click
Run. In the rare circumstance that you wish to fit a model without an intercept term
(regression through the origin), click No Intercept before clicking Run.
 To add a regression line or least squares line to a scatterplot,
select Analyze > Fit Y by X. Move the response variable into the Y, Response box,
move the predictor variable into the X, Factor box, and click OK. Click on the red
triangle in the resulting Fit Y by X output window, and select Fit Line.
 To find 95% confidence intervals for the regression parameters in a simple
or multiple linear regression model, fit the model using computer help #25 or #31, rightclick in
the body of the Parameter Estimates table in the resulting Fit Least Squares
output window, and select Columns > Lower 95% and Columns > Upper 95%.

 To find a fitted value or predicted value of Y (the response
variable) at a particular value of X (the predictor variable) in a linear regression model, fit the
model using computer help #25 or #31, click on the red triangle next to Response in the
resulting Fit Least Squares output window, and select
Save Columns > Predicted Values. This will produce fitted or predicted values of Y for
each of the Xvalues in the dataset by default (in a column labeled Predicted *, where the star represents the response variable name). Each time you ask JMP to calculate fitted or predicted
values of Y like this it will add a new column to the dataset and append a number to the column
header (e.g., "2" for the second time).
 You can also obtain a fitted or predicted values of Y at an
Xvalue that is not in the dataset by doing the following. Before fitting the regression model, add
the Xvalue to the dataset (go down to the bottom of the spreadsheet and type the Xvalue in the
appropriate cell of the next blank row) Then fit the regression model and follow the steps above.
JMP will ignore the Xvalue you typed when fitting the model (since there is no corresponding
Yvalue), so all the regression output (such as the estimated regression parameters) will be the
same. But JMP will calculate a fitted or predicted value of Y at this new Xvalue based on the
results of the regression. Again, look for it in the dataset in the column labeled
Predicted *.
 This applies more generally to multiple linear regression also.

 To find a confidence interval for the mean of Y at a particular value of X
in a linear regression model, fit the model using computer help #25 or #31, click on the red
triangle next to Response in the resulting Fit Least Squares output window, and
select Save Columns > Mean Confidence Interval. This will produce 95% intervals for each
of the Xvalues in the dataset by default (in columns labeled Lower 95% Mean * and
Upper 95% Mean *). Each time you ask JMP to calculate confidence intervals like this it
will add new columns to the dataset and append a number to the column headers (e.g., "2" for the
second time). If you hold down the Shift key and then select
Save Columns > Mean Confidence Interval you'll be prompted to enter a significance level
(e.g., enter 0.10 for 90% intervals).
 You can also obtain a confidence interval for the mean of Y
at an Xvalue that is not in the dataset by doing the following. Before fitting the regression
model, add the Xvalue to the dataset (go down to the bottom of the spreadsheet and type the Xvalue
in the appropriate cell of the next blank row) Then fit the regression model and follow the steps
above. JMP will ignore the Xvalue you typed when fitting the model (since there is no corresponding
Yvalue), so all the regression output (such as the estimated regression parameters) will be the
same. But JMP will calculate a confidence interval for the mean of Y at this new Xvalue based on
the results of the regression. Again, look for it in the dataset in the columns labeled
Lower 95% Mean * and Upper 95% Mean *.
 This applies more generally to multiple linear regression also.

 To find a prediction interval for an individual value of Y at a particular
value of X in a linear regression model, fit the model using computer help #25 or #31, click on the
red triangle next to Response in the resulting Fit Least Squares output window,
and select Save Columns > Indiv Confidence Interval. This will produce 95% intervals for
each of the Xvalues in the dataset by default (in columns labeled Lower 95% Indiv * and
Upper 95% Indiv *). Each time you ask JMP to calculate confidence intervals like this it
will add new columns to the dataset and append a number to the column headers (e.g., "2" for the
second time). If you hold down the Shift key and then select
Save Columns > Indiv Confidence Interval you'll be prompted to enter a significance level
(e.g., enter 0.10 for 90% intervals).
 You can also obtain a prediction interval for an individual
Yvalue at an Xvalue that is not in the dataset by doing the following. Before fitting the
regression model, add the Xvalue to the dataset (go down to the bottom of the spreadsheet and type
the Xvalue in the appropriate cell of the next blank row) Then fit the regression model and follow
the steps above. JMP will ignore the Xvalue you typed when fitting the model (since there is no
corresponding Yvalue), so all the regression output (such as the estimated regression parameters)
will be the same. But JMP will calculate a prediction interval for an individual Y at this new
Xvalue based on the results of the regression. Again, look for it in the dataset in the columns
labeled Lower 95% Indiv * and Upper 95% Indiv *.
 This applies more generally to multiple linear regression also.
Multiple linear regression
 To fit a multiple linear regression model, select
Analyze > Fit Model. Move the response variable into the Y box, select the
predictor variables and Add them to the Construct Model Effects box, and click
Run. In the rare circumstance that you wish to fit a model without an intercept term
(regression through the origin), click No Intercept before clicking Run.
 To add a quadratic regression line to a scatterplot, select
Analyze > Fit Y by X. Move the response variable into the Y, Response box, move
the predictor variable into the X, Factor box, and click OK. Click on the red
triangle in the resulting Fit Y by X output window, and select
Fit Polynomial > 2,quadratic.
 Categories of a qualitative variable can be thought of as defining subsets
of the sample. If there are also a quantitative response and a quantitative predictor variable in
the dataset, a regression model can be fit to the data to represent separate regression lines for
each subset. First use computer help #15 and #17 to make a scatterplot with the response variable on
the vertical axis, the quantitative predictor variable on the horizontal axis, and the cases marked
with different colors according to the categories in the qualitative predictor variable. To add a
regression line for each subset to this scatterplot first click on the red triangle in the
resulting Fit Y by X output window, select Group By ..., select the qualitative
predictor variable, and click OK. Then click on the red triangle again and select
Fit Line.
 To find the Fstatistic and associated pvalue for a nested model Ftest in
multiple linear regression, fit the model using computer help #31, click on the red triangle next to
Response in the resulting Fit Least Squares output window, and select
Custom Test.... The resulting Custom Test output will have a list of regression
parameters that has a column of zeroes next to it; click the zero next to the first parameter in the
nested Ftest null hypothesis and change the value to "1." Then click Add Column and repeat
for the second parameter in the null hypothesis. Repeat for each of the parameters in the null
hypothesis, then click Done.
 To save residuals in a multiple linear regression model, fit the model
using computer help #31, click on the red triangle next to Response in the resulting
Fit Least Squares output window, and select Save Columns > Residuals. The
residuals are saved as a variable called Residual *, where the star represents the response
variable name; they can now be used just like any other variable, for example, to construct residual
plots. To save what Pardoe (2012) calls standardized residuals, select
Save Columns > Studentized Residuals—they will be saved as a variable called Studentized Resid *. JMP does not appear to offer a way to save what Pardoe (2012) calls
studentized residuals
 JMP does not appear to offer a way to add a loess fitted line to a
scatterplot but it can add a similar smoothing spline fitted line (useful for checking the
zero mean regression assumption in a residual plot). To do so, select Analyze > Fit Y by X.
Move the vertical axis variable (e.g., the studentized residuals) into the Y, Response box,
move the horizontal axis variable into the X, Factor box, and click OK. Click on
the red triangle in the resulting Fit Y by X output window, and select Fit Spline;
you can experiment to find a value for the smoothing parameter "lambda" that captures the major
trends in the scatterplot without being overly "wiggly," but typically a value of 1 or 10 should work well.
 To save leverages in a multiple linear regression model, fit the model
using computer help #31, click on the red triangle next to Response in the resulting
Fit Least Squares output window, and select Save Columns > Hats. The leverages
are saved as a variable called h *, where the star represents the response variable name;
they can now be used just like any other variable, for example, to construct scatterplots.
 To save Cook's distances in a multiple linear regression model, fit the
model using computer help #31, click on the red triangle next to Response in the resulting
Fit Least Squares output window, and select Save Columns > Cook's D Influence.
The Cook's distances are saved as a variable called Cook's D Influence *, where the star
represents the response variable name; they can now be used just like any other variable, for example, to construct scatterplots.
 JMP will automatically create a residual plot in a multiple linear
regression model, specifically one with the (ordinary) residuals on the vertical axis versus the
predicted values on the horizontal axis. To create residual plots manually, first create
standardized residuals (see computer help #35), and then construct scatterplots with these
standardized residuals on the vertical axis.
 To create a correlation matrix of quantitative variables (useful for
checking potential multicollinearity problems), select
Analyze > Multivariate Methods > Multivariate. Move all the variables into the
Y, Columns box and click OK.
 To find variance inflation factors in multiple linear regression, fit the
model using computer help #31, rightclick in the body of the Parameter Estimates table in
the resulting Fit Least Squares output window, and select Columns > VIF.
 To draw a predictor effect plot for graphically displaying the effects of
transformed quantitative predictors and/or interactions between quantitative and qualitative
predictors in multiple linear regression, first create a variable representing the effect, say,
"X1effect" (see computer help #6).
 If the "X1effect" variable just involves X1 (e.g., 1 + 3X1 + 4X1^{2}),
then use computer help #26 to create the line plot.
 If the "X1effect" variable also involves a qualitative variable (e.g.,
1 − 2X1 + 3D2X1, where D2 is an indicator variable), you should then use computer help #33 to create the line plot.
See Section 5.5 in Pardoe (2012) for an example.
Last updated: June, 2012
© 2012, Iain Pardoe