# Guidelines: How to Make Figures and do Statistical Tests in Excel

(These guidelines are modified from materials developed by C. Caceres for IB 449 Limnology).

### Students Using Microsoft Office (2007 for Windows) or (2008 for Macs):

The instructions for Excel differ between Windows and Mac versions and between 2004 & 2008 Mac versions of Excel. All instructions below are for Excel in Microsoft Office (2004), which is installed on the clasroom Macs. When not using a Mac with 2004, you have two options:

1. Use a computer with an earlier version of Office;
2. Use instructions below to figure out roughly how to do assignment on your computer.
1. If you want someone (TA or other student) using an earlier version to be able to view your 2007 file, you must save your file in an Office 2003 compatible format.
2. You can save your 2007 file in any format, including 2007, if you want to print it.
3. The 2008 file extension for Mac Excel is .xlsx and it is readable in 2004 office.

### T-TEST: 1 categorical independent variable with 2 treatments; 1 dependent variable

1. Use Excel to create a Figure
1. Calculate average and SD for each treatment. Create this type of table:
 Independent Variable Average SD SE = SD / √N Treatment 1 Fill in values Treatment 2
2. Highlight the 6 cells in the first 2 columns of the table.
MAC: Click on charts (with bar chart) located in the menu bar. PC: From the menu bar, pull down Insert and click 'Chart'.
Click on the type of chart and subtype you want to make. Click next.
3. Do NOT give the Figure a title. Label x-axis (horizontal with independent variable); label y- axis (vertical with dependent variable).
4. Clean up the graph while on the sheet for labeling axes. (remove extra legends, gridlines, etc.). (Click on Gridlines in menu and un-do check next to gridline box, etc.)
1. Select 'next'; select 'as object in sheet 1'. 'Finish'.
2. To change the x axis so that it starts at 0, click on the axis. Select scale. Minimum = 0.
5. Add Standard Error bars to each average.
1. If using MAC: double click inside one bar; select 'Y error bars'.
If using PC: right click inside a bar; select Format Data Series; select 'Y error bars'
2. .
3. For both MAC and PC: For display, select 'both'. For Error Amount, select 'Custom'. Select triangle (or icon) to R.
4. For both MAC and PC: Highlight SD data from worksheet (for + bar).
Repeat for – bar. 'OK'.
6. In a cell BELOW the figure, create a figure caption (we also call this a figure legend). The 'rule of thumb' is that the reader should completely understand the figure without reading any of the text of the paper. Include what the bar and vertical lines indicate (mean +/- SE). Keep it to one incomplete 'sentence'.
2. Use Excel to do t-test.
1. Click a cell on worksheet.
Select 'fx' in the toolbar. Select 'Statistical', then 'TTEST'. OK.
2. Array 1 = original data from treatment 1.
Array 2 = original data from treatment 2.
Tails = 2 (2-tailed t-test)
Type = 3 (2-sample test unequal variance). OK
3. The probability value is printed on the worksheet. If it is P < 0.05, then there is less than a 1 in 20 probability that the difference between the means of the two populations is due to chance.
3. Interpretation of results.
2. Based on your analysis, what can you conclude? Do the data support your hypothesis?
3. To write up the "Results" section of a scientific manuscript, summarize the main pattern in your figure, state the statistical output, and refer to Figure 1 (in parentheses at end of sentence). Keep the results to 1 sentence.

RESULTS: The mean plant species richness was significantly (or non-significantly, depending on what your result is) greater in the X-year than the X-year field (t-test P <0.05) (Fig. 1).

### CHI-SQUARE TEST OF INDEPENDENCE: 2 categorical variables + count data

1. How to make a figure in Excel:
1. Create a table like this:
 OBSERVED edge interior row total snap-off 43 46 89 uproot 32 79 111 column total 75 125 200 Grand total EXPECTED 33.375 55.625 89 41.625 69.375 111 75 125 200
2. Highlight the 9 cells in the first 3 columns of the observed table (cells with labels and observed data; NOT row or column totals).
MAC: Click on the chart wizard (with bar chart) icon that is located in the toolbar. PC: From the menu bar, pull down Insert and click 'Chart'. Click on the type of chart and subtype you want to make. Click next.
3. Do NOT give the Figure a title. Label x-axis (horizontal) one of two categorical variables; the second variable is automatical printed ); label y- axis (vertical with "number of XXX).
4. Clean up the graph while on the sheet for labeling axes. (gridlines, etc.). (Click on Gridlines in menu and un-do check next to gridline box, etc.)
1. Select 'next'; select 'as object in sheet 1'. 'Finish'.
2. To change the x axis so that it starts at 0, click on the axis. Select scale.
Minimum = 0.
2. How to do a Chi-Square test with a calculator:
1. Use the table created to make the figure.
2. Calculate the marginal totals of observations (add up columns and rows). Grand total is the total of the two row totals (= total of the two column totals).
3. Complete the expected frequency table (E) above to one decimal place. This table will contain values expected if the type of tree death is independent of the location (This is the null hypothesis). Multiply each row total X each column total and divide each product by the grand total. (E.g. Expected # Snap-offs in the Edge is (75 X 89)/200).
4. Calculate Chi-square by comparing the observed and expected frequencies for each of the four cells in the two tables. Subtract the difference, square it, and divide by the expected frequency for each cell. Add the results for each cell. Show your math.
χ2 =
5. Use the Chi-Square Statistics Table at the front of the manual after GUIDELINES TO BASIC STATISTICS (or MSLS: Appendix 3) to determine whether type of tree death is significantly independent of location. If the χ2 is greater than the value in a chi-square table with P <= 0.05 and 1 degree of freedom (1DF) [# rows-1)(# columns-1) = 1], then the null hypothesis of independence is rejected. The two variables are associated. If P > 0.05, then you cannot reject the null hypothesis of independence.
3. Interpretation of results.
1. If P < 0.05, the variables are associated. E.g. the type of treefall is associated with the location in the forest. Look at the values of (o-e) for each cell and determine which type of treefall is associated with which location. E.g. snap-offs are more common at the edge and uproots in the interior than expected if the distributions of the two variables were independent.
2. If P > 0.05 the variables are independent. E.g. the type of treefall is not associated with the location in the forest. Look at the values of (o-e) for each cell. There is little difference between these values. E.g. a snap-off (or uproot) is as likely to be at the edge as in the interior.
3. RESULTS: The distribution of types of Z was independent (or dependent) of location (χ2 = 2.1, df-1, p>0.05) (Fig.2). Overall, X were more than Y, but the ratio of X to Y was similar (or not) in both locations.

### CORRELATION: 2 CONTINUOUS VARIABLES

1. How to create a figure in Excel:
1. Have the data for the 2 variables in the first 2 columns of the worksheet.
2. Make a (XY) scattergram figure.
1. Select Chart Wizard in toolbar.
2. Chart type: XY (Scatter); Chart sub-type: upper left.
3. Data range: select all cells on worksheet with data.
4. Don't give the figure a title. Label the x (include units) and y axes.
5. Save as object in Sheet 1. Move to lower left of page.
3. In a cell below the figure, create a figure caption. The 'rule of thumb' is that the reader should completely understand the figure without reading any of the text of the paper. Keep it to one 'sentence'.

Figure 1. Number of fruits as a function of total leaf area (cm2) of a pawpaw tree.

2. How to calculate CORRELATION statistic in Excel: (no cause-effect of x and y)
1. Calculate the correlation coefficient = r, which measures the strength of association between the two variables.
2. Type 'correlation' in a cell below column A data. Select cell to its right.
3. Select fx in the toolbar, then 'statistical', then 'correl'.
4. Array 1: Select data in column A
Array 2: Select data in column B
3. Interpretation of results
1. Whether r = significant depends on the degrees of freedom. The critical value that r must be greater than to be significant (P &l0.05) can be looked up in a correlation table in the GUIDELINES FOR BASIC STATISTICS.
2. If your r value is significant (P <0.05), you have evidence that the two variables are associated. Whether the association is positive or negative depends on the sign (+ or -) of the value of r. A positive slope will have a positive r; a negative slope will have a negative r. The closer the value of r is to 1 (or -1), the greater the strength of the association.
3. Based on your analysis, what can you conclude? Do the data support your hypothesis?
4. To write up the "Results" section of a manuscript, summarize the main pattern in your figure. State the statistical output and refer to Fig. 1 in parentheses at the end of the sentence. Keep the results to 1 sentence.

E.g. Number of fruits did not increase significantly as total leaf area of the tree increased (r = 0.523, P > 0.05) (Fig. 1).

### REGRESSION (assuming cause-effect between two continuous variables):

1. Add a regression line to the figure.
MAC:
1. Highlight chart area within the figure.
3. For Type tab, select 'linear'.
PC:
1. Right click on one point in the figure.
2. do as above for MAC
2. Calculate R2 (values range from 0-1).
1. Highlight chart area within the figure.
3. For Type tab, select 'Options'.
4. Check 'Display R2 value on chart.
5. What is the R2 value? __________________ How well does the line represent the data points? Interpret the R2 value.
6. Click on R2 value on figure and delete it.
3. Get an equation for the line.
1. Highlight chart area
3. For Type tab, select 'Options'.
4. Check 'Display equation on chart'.
5. What is the equation?
4. To write up the "Results" section of a manuscript, summarize the main pattern in your figure. State the statistical output and refer to Fig. 1 in parentheses at the end of the sentence. Keep the results to 1 sentence.

E.g. An increase in light intensity resulted in greater algal biomass (R2 = 0.75, P<0.01) (Fig. 1).

### ANOVA (Analysis of Variance): 1 categ. indep. var. with >2 treatments

1. How to create figure in Excel:
1. Follow directions above for t-test. The only difference is to create a table with average and SD for each treatment, (not just 2).
2. How to do an ANOVA test in Excel:
1. Data must be organized with each treatment in a separate column.
2. Go to the Tools menu in Excel. Select "Data Analysis" at the bottom of the menu. (If you don't see it, select "Add-Ins" and then select "Analysis Tool Pac" to install the data analysis tools; this has to be done only once. Go back and select "Data Analysis" from the Tools menu.)
3. Select "ANOVA: Single Factor", then "input range" (data you want to analyze, Grouped by "Column", "Labels in first row", alpha = 0.05, and "Output range " = an empty cell (e.g. A17).
3. Interpretation of statistical results:
Whether the ANOVA is significant depends on the degrees of freedom. The critical value that the F-statistic must be greater than to be significant (P <0.05) will have to be looked up in a table. (Ask TA).

Based on your analysis, what can you conclude? Do the data support your hypothesis?

Recall that the GUIDELINES ON BASIC STATISTICS states for ANOVA that if there is a significant difference among treatment means, ANOVA does not tell you anything about which treatments are significantly different from the others. Now you must do multiple comparison tests (or, in our case, t-tests) to determine which treatments are significantly different from the others (a separate t-test for each comparison of each pair of treatments).

To write the 'Results' section of a manuscript, summarize the following, each in one sentence:
1. The conclusion based on ANOVA. Look at the figure and ANOVA. At the end of the sentence (put statistical text and P value in parentheses) (put Fig. 1 in parentheses).

E.g. The % cover of tree trunks by moss differed significantly among directions (ANOVA P<0.025) (Fig. 1).

2. The conclusion based on multiple comparisons. Look at the t-tests. At the end of the sentence (put statistical test and P value in parentheses).

E.g. The north side had greater % cover than the south side (t-test, P<0.05), but had similar % cover to the east and west sides (t-test, n.s. P>0.05).