School of
Integrative Biology

# Glossary for the Scientific Process

## Components

### Data

Small amount of information about the subject of an investigation. (data = plural; datum = singular)

Discrete Data

Each point can be only a whole number. Cats would be discrete units because there is no possibility of a fraction of a cat. We count or tally these data. Chi-square analyses are based on discrete data.

Continuous Data

Points taken along a scale that can be infinitely subdivided. Time, weight, and temperature are examples. We measure these data.

Categorical Data

Each point falls into a non-numeric group or category, e.g., male or female.

### Distribution

The pattern of occurrence of a set of data.

Class

A group of data points whose limits are set by an upper and a lower value. Used in making a frequency distribution. We divide the overall range of the values in our data set into a number of classes and count the number of data points that fall into each of these classes.

Frequency Distribution

The pattern of number of measurements that fall into each class.

Normal Distribution

Data in a frequency distribution spread out equally on either side of a central high point. Its appearance is that of a symmetrical bell-shaped curve with the tails of the curve extending infinitely far in both positive and negative directions. Central to probability in analytical statistics. (See below.)

### Experimental Design

Experimental Design - The formal design worked out to test the prediction of a hypothesis. It designates independent variables to create so that responses in dependent variables that depend on the manipulations can be analyzed.

Independent Variable

The factor to be varied by direct manipulation by the investigator or by natural categorization in the experiment. It is expected to cause an effect in the dependent variable. In a graph this variable occurs on the x (horizontal) axis.

Dependent Variable

The variable whose response we measure in the experiment. It is expected to result from variation in the independent variable. In a graph this variable occurs on the y (vertical) axis. It may be a continuous or discrete variable.

Treatment

One of the categories varied in a categorical independent variable.

Control

A special treatment in a manipulative experiment. It is the standard, the group left unmanipulated, and provides baseline comparative data to evaluate the effect of the manipulated treatment.

Item Measured/Counted

The item that is measured or counted in an experiment (e.g. a tree, a leaf, a quadrat, a population).

Sampling Unit

The number of items measured/counted included in one datum point. (e.g. number per sampling unit; number of trees/quadrat; number of seeds/dish)

Replicates (N)

The number of sampling units in each treatment.

### Experimentation

Methods used to test predictions of hypotheses.

Manipulation

Alterations in the independent variable are created by the investigator.

Observation

Natural variation in the independent variable occurs, requiring no alteration, only direct observation of the dependent variable by the investigator.

Measurement

Performed on continuous variables.

Count or Tally

Performed on discrete and categorical variables.

### Graph (Figure/Chart)

A diagram that represents the variation of a variable in comparison with that of one or more other variables.

Axes

Horizontal (x axis, abscissa) for independent variable(s). Vertical (y axis, ordinate) for dependent variable(s).

Bar Graph

Used when the independent variable is categorical or divided into classes (otherwise known as Column Graph).

Box and Whisker Graph

Used to display differences among treatments in means, ranges, and standard deviations.

Column Graph

(see Bar Graph).

Histogram

Used to display frequency distributions. Classes of measurement occur on the x axis and frequency in each class on the y axis.

Line Graph

Used when the x axis represents a continuous variable. Sometimes the x variable is the independent variable. Other times, as in showing a correlation, neither x nor y variables are designated as independent or dependent variables.

### Hypothesis

A formal statement of a possible explanation for an observed phenomenon. A "might-be" about the way the world works. It leads to predictions.

Null Hypothesis

A statistical hypothesis stating that there is no association between two variables or no difference among means. (e.g. H0: A=B)

Alternative Hypothesis

A statistical hypothesis stating the pattern in the data that is expected if the predictions holds true. (e.g. HA: A>B)

Speculation

A first informal attempt at explaining an observed phenomenon.

Prediction

A consequence expected by the logic of the hypothesis. An experiment arises out of the predictions.

If…,then logic

A formal conditional statement of the hypothesis and prediction that uses deductive logic. If the phenomenon I observed can be explained in this way, then these consequences should occur. The "if" clause contains the hypothesis and the "then" clause the prediction that is to be tested.

Cause…Effect

In a manipulation experiment, we test whether the independent variable causes an effect (response) in the dependent variable.

Testability

The hypothesis must be susceptible to testing through the scientific process, where science is limited to the study of the physical world.

Assumption

A fact that is taken-for-granted in the experiment. If the experiment fails to falsify the hypothesis, the assumption may not have been true and now itself would need to be tested.

### Population

Any set of individuals or objects having some common observable characteristic. The unit from which the data sample is taken.

Sample or Sample Set (N)

The sub-set of the population measured or counted in the experiment.

Random Sample

A sample taken with no bias.

### Others

Scientific Process or Method

The logical process by which scientific information is gathered by asking and answering questions about the physical world.

Statistics

A tool used 1) to describe trends and relationships and 2) to decide whether to accept or reject an hypothesis based on the probability of whether the results of an experiment could have occurred by chance or not.

### Descriptive Statistics

A summary of data in a variable that provides information about its central tendencies and dispersion

Parameter

A measurable characteristic of a given distribution, e.g., mean, variance, standard deviation.

Central Tendency

A measurement that represents the center point of a data set.

Mean (median)

The numerical average of a data set. Calculated by adding up the values of a sample and dividing by the number of observations, N. A mean is always strongly affected by extreme readings. When reported alone, the mean may not be very meaningful. If the data are very skewed or bimodal, the mean might be deceptive. Always report a measure of dispersion as well.

Median

The middlemost value of a data set. If all data points are organized from smallest to largest, the median is the middle point. It is less susceptible to distortion by an extreme reading.

### Dispersion

A measurement of the spread of the data around the mean.

Range

The distance between the lowest and the highest values.

Variance (s2)

The square of the standard deviation (see below)

Standard Deviations (s) or S.D.

A sort of average of the deviation of all observed values from the mean. If the S.D. is small, then most of the sample values lie quite close to the sample mean, but if the S.D. is large, then many of the sample values lie rather far from the sample mean. In a data set that fits a normal distribution, the S.D. can be found by drawing to the horizontal axis a line from the point of inflection of the normal curve and from the mean point. Then, the S.D. is equal to the distance between the mean point on the baseline and the point of inflection on the baseline. Of the data from a normally distributed population, 34% by definition falls within 1 S.D. from one side of the mean, 48% within 2 S.D., and 49.87% within 3 S.D. One S.D. on either side of the mean includes 68% of the data, 2 S.D. includes 95%, and 3 S.D. includes 99.74% of the data. .

Standard Error (S.E.)

The range within which the mean is found 68.2% of the time. S.E. = standard deviation divided by the square root of N, the sample size.

### Analytical (Comparative) Statistics

A series of tests used to examine different kinds of data to determine whether or not to accept or reject a hypothesis.

Null hypothesis (H0)

A statistical hypothesis stating that there is no difference between treatments (populations) in an experiment. Statistical tests are designed to see whether or not you can reject your null hypothesis.

Alternative Hypothesis (H1)

A standard hypothesis stating that there is a difference between treatments (populations) in an experiment. The statistical test is designed to provide support for the alternative hypothesis only if the null hypothesis is rejected. There may be more than one alternative hypothesis to explain an observed phenomenon.

Probability

A mathematical theory that provides a basis for the evaluation of the reliability of the conclusions and inference based on the data.

Level of Significance (probability value) (alpha level)

The probability of making a Type I error. It furnishes the probability basis upon which we accept or reject a hypothesis. The size of the discrepancy between the value of the null hypothesis and the alternative hypothesis provides the basis of judging the probability of obtaining the discrepancy. Small discrepancies from a valid hypothesis due to sampling error (chance) are common; large discrepancies are rare. If we assign values to the level of discrepancy, we say that with a true null hypothesis large discrepancies occur only 5% of the time and small discrepancies the remaining 95% of the time. Using this arbitrary criterion, we can then propose to reject the null hypothesis if the discrepancy is so large that it occurs only 5% of the time by chance. The 5% frequency values that enables us to reject the null hypothesis is called the 5% level of significance. When alpha is set at .05, the chances are 1 out of 20 that a true null hypothesis will be accidentally rejected. A small value for alpha is used to provide protection against rejecting true null hypotheses.

Degrees of Freedom

The number of independent classes. The number of classes about which you need information in order to know the distribution of data points in all classes.

Statistically Significant

The discrepancy between null and alternative hypotheses is so large (occurs less than 5% of the time by chance) that it causes us to reject the null hypothesis.

Statistically Non-significant

The discrepancy between null and alternative hypothesis is so small (occurs more than 5% of the time by chance) that it causes us to accept the null hypothesis.

Type I Error

The rejection of a true null hypothesis. If you flip a fair coin and get ten heads in a row, your H0 : 10:0=50:50 would be rejected and you would conclude that the coin was unfair.

Type II Error

The failure to reject a false H0. If you flip a coin and throw 4 heads and 6 tails, you would fail to reject the null hypothesis that 4:6=50:50. A much larger sample size, giving results of 400:600 would have caused you to reject the null. If the coin really was unfair but you had concluded that it was fair on the basis of your toss of 4:6, you would have made a Type II error by failing to reject a false null hypothesis.

Proof

The scientific process and statistical analysis are designed to reject (falsify) hypotheses. There is no such thing as proof or proving a hypothesis. No matter how many times a hypothesis is confirmed, the next observation may prove it to be false. We "accept" a hypothesis as if it were true if it has been reasonably confirmed and there is no evidence to contradict it; but it may be shown to be false at any time.

Falsify

The scientific process and statistical analysis are designed to indicate whether the hypothesis is accepted (not proved) or falsified (shown to be incorrect).

Conclusion

The final step of the scientific process. Based upon statistical evidence, a decision is made as to whether the hypothesis is accepted or falsified by the data sets.

### Variable

Some factor that can have more than one value.

Categorical Variable

A discrete variable in which a value takes on a certain non-numeric state or category. Here, the mean of categories usually doesn’t make much sense, e.g., a flower is caged or non-caged. Used in contingency tables and X2 tests.

Continuous Variable

An analytical variable that can take any value limited only by our ability to differentiate values. Here, the mean of a sample has some meaning, e.g., length of time a bee visits a flower. Used in t-tests, ANOVA, regression, and correlation.

Discrete Variable

A variable that can take a limited number of values, those with a separate identity that cannot be subdivided, e.g., population size.