The study of statistics begins with potential confusion—the word "statistics" has two different meanings. Statistics is the name of the process used to summarize collected observations to describe essential properties of a sampled population, frequently leading to a better understanding of a specific topic or issue. The analytic tools used to create this description are also called statistics. This text is about both kinds of statistics but primarily it is about one specific statistical tool, the sample mean value. The principles underlying this fundamental and relatively simple way to explore data are at the center of this text.
The study of statistics is not always well received by students enrolled in an introductory and sometimes required course. Part of the explanation may be the kind of textbook used in introductory classes. Many modern introductory books are extensive (in the neighborhood of 800-1000 pages) and contain a large number of data sets (usually a computer disk is included). They are numerous problems/exercises for each of many sections (one popular text contains more than 1200 problems). These texts generally minimize the role of even simple mathematics to make the presented material accessible to a wide audience. Whether these texts are written for students in public health, business, biology, economics, or social science, they contain a large variety of topics aimed at creating an extensive toolbox of statistical techniques. In addition, these techniques are usually supported with one or more statistical computer packages (for example, Stata, SAS, Excel, Statistica, or Minitab).
This text has a different objective. The goal here is to provide a sophisticated introduction of how statistics works at a beginning level. As in all statistics texts, a number of useful and important statistical techniques are discussed but this text sharply focuses on the sample mean as a way of understanding the statistical process in general. The book is short (less than 400 pages), contains only a limited number of problems, uses elementary mathematics, and makes no mention of computer applications.
A few problems at the end of each chapter (many adapted from research journal articles) are not a series of "practice" problems. These small hands-on data sets are intended to encourage the reader to work carefully through the details of the statistical process as part of the text's how-it-works philosophy. Most of the data used throughout the text also come from actual research projects and consist of only a few representative observations (usually less than 30) so that the reader can readily duplicate the results using a handheld scientific calculator, a spreadsheet program such as Excel, or any statistical system. The reader should work all the problem sets included. While small in number, these problems are quite focused on key ideas.
The reason statistical analysis tools are useful is concisely and unambiguously expressed in the language of symbols. In addition, elementary mathematics frequently demonstrates clearly the logic of a specific approach. Simply algebraic explanations (enclosed in boxes) are one of the several ways the statistical concepts are presented. (This mathematical material is not necessary for a first reading and can be skipped without disrupting the logic or flow of the text.) Parallel numeric examples also concretely demonstrate the important features of each concept. Graphic displays, included wherever possible, provide visual interpretations. Elementary mathematics, worked examples, and graphic illustrations brought together with detailed discussions potentially provide a keen insight into the statistical process.
After introducing the sample mean and a few other descriptive statistics, the text turns to a bit of elementary probability theory. The introduction to probability is then extended to characterize samples selected from populations. The text next explores the accuracy and precision of the mean value calculated from samples (hypothesis testing and confidence intervals). Then, digressing slightly from the discussion of the properties of the mean value, the chi-square analytic technique is presented. A mastery of elementary chi-square techniques adds perspective to statistical testing and, in general, illustrates the process of evaluating the impact of random variation on specific kinds of data. The remainder of the text deals with summarizing and analyzing bivariate data (regression and correlation analysis).
The material in this text was developed for a large (over 300 students) one-semester course taken by a mixture of graduate and undergraduate students, primarily from the School of Public Health and the biological sciences at the University of California, Berkeley. These students are not required to have a previous statistics course or a mathematical background beyond usual high school algebra. The text is designed to appeal to two kinds of students: those who plan to continue on to more data analysis-oriented courses and others who will not be directly analyzing data but wish to understand a process that frequently makes modern and complex issues more comprehensible. An introductory text that traces the thread of statistical logic for the narrow but important case of a sample mean provides both kinds of students with a foundation for understanding how and why the statistical process works.
There are many students and colleagues who have contributed to the material and spirit of this text, particularly Ms. Carol Langhauser and Dr. Chin Long Chiang who created some of the examples and problems, and Dr. Mark Nudes who carefully read the text and made helpful suggestions. I also want to thank the reviewers whose suggestions improved the manuscript: Peter Mac Donald, McMasters University; William Briggs, Weill Cornell Medical College; and P. K. Pathak, Michigan State University.
Steve Selvin
Probability: Properties of Samples | |
Descriptive Statistics | |
Summary Measures | |
Graphic Representation | |
Probability | |
Eight Rules of Probability | |
Composite Events | |
Bayes' Rule | |
Four Probability Problems | |
Random Variables | |
Random Variables | |
Joint Probability Distribution | |
Probability Distributions | |
Binominal Probability Distribution | |
Normal Profitability Distribution | |
Central Limit Theorem | |
Statistics: Properties of Sampled Populations | |
Statistical Inference I | |
Description of a Confidence Interval | |
Statistical Hypothesis Testing | |
Statistical Inference II | |
Student'st–Distribution | |
Computation of Sample Size | |
Chi-Square Analysis | |
Independence of Two Categorical Variables “rbycTable.” | |
Linear Regression | |
Least Squares Estimation | |
Assessing an Estimated Regression Line | |
Assessing Regression Lines from Two Groups | |
Correlation | |
Testing a Correlation Coefficient | |
Confidence Interval for a Correlation Coefficient | |
References | |
Normal Distributions | |
t-distribution | |
Chi-square Distribution | |
Values for Testing Correlations (conversion oft-values) | |
Values for Testing Rank Correlations | |
Chart: Confidence Intervals for Correlation Coefficients | |
Summation Notation | |
Derivation of the Normal Equations for Simple Linear Regression | |
Poisson Probability Distribution | |
Problem Sets: 1 to 15 | |
Partial Solutions to Most Problems (Sets 1 to 15) | |
Table of Contents provided by Publisher. All Rights Reserved. |