## 统计代写|概率与统计作业代写Probability and Statistics代考|Sampling Plans and Estimates

In the previous chapter we computed descriptive statistics for the dataset on faces. The results showed that the average rating was $58.37$ and that men rated the faces higher than women on average. If we are only interested in the participants in the study and we are willing to believe that the results are fully deterministic, ${ }^1$ we could claim that the group of men rates higher than the group of women on average. However, if we believe that the ratings are not constant for one person for the same set of faces ${ }^2$ or if we would like to know whether our statements would also hold for a larger group of people (who did not participate in our experiment), we must understand what other results could have been observed in our study if we had conducted the experiment at another time with the same group of participants or with another group of participants.

To be able to extend your conclusions beyond the observed data, which is called more technically statistical inference, you should wonder where the dataset came from, how participants were collected, and how the results were obtained. For example, if the women who participated in the study of rating faces all came from one small village in the Netherlands, while the men came from many different villages and cities in the Netherlands, you would probably agree that the comparison between the average ratings from men and women becomes less meaningful. In this situation the dataset is considered selective towards women in the small village. Selective means here that not all women from the villages and cities included in the study are represented by the women in the study, but only a specific subgroup of women have been included. To overcome these types of issues, we need to know about the concepts of population, sample, sampling procedures, and estimation of population characteristics, and also how these concepts are related to each other to be able to do proper statistical inference.

Figure $2.1$ visualizes the relation between these concepts. On the left side we have a population of units (e.g., all men and women from the Netherlands) and on the right side we have a subset of units (the sample). Sampling procedures are formal probabilistic approaches to help collect units from the population for the sample. For the sample we like to use $x_1, x_2, \ldots, x_n$ for the observations of a certain variable (e.g., ratings on faces from pictures). The calculations on the sample data, which we have learned in Chap. 1, are ways of describing the sample. For the population the same notation $x_1, x_2, \ldots, x_N$ for all $N$ units is used. Here we have used the same indices for the sample and the population, but this does not mean that the sample $x_1$, $x_2, \ldots, x_n$ is just the first $n$ units from the population $x_1, x_2, \ldots, x_N$. Mathematically. we should have written $x_{i_1}, x_{i_2}, \ldots, x_{i_n}$ for the sample data, with $i_h \in{1,2, \ldots, N}$ and $i_h \neq i_l$ when $h \neq l$, since any set of units $i_1, i_2, \ldots, i_n$ from the population could have ended up in the sample. The values in the sample are referred to as a realization from the population.

## 统计代写|概率与统计作业代写Probability and Statistics代考|Definitions and Standard Terminology

In this section we briefly introduce some definitions and standard terminology. Frequently, we wish to say something about a group of units other than just the ones we have measured. A unit is usually a concrete or physical thing for which we would like to measure its characteristics. In medical research and the social sciences units are mostly human beings, while in industry units are often products, but units can essentially be anything: text messages, financial transitions, sales, etc. The complete set of units that we would like to say something about is called the (target) population. The set of units for which we have obtained data is referred to as the sample. The sample is typically a subset of the population, although in theory the sample can form the whole population or the sample can contain units that are not from the target population. If we are interested in individuals in the age range of 25 years to 65 years, it could happen that a person with an age outside this range is accidently included in the sample.

Statistics is concerned with how we can say things, and what we can say, about a population given that we have only observed our sample data. As we mentioned before, we call this statistical inference: “Statistical inference is the process of deducing properties of an underlying population by analysis of the sample data. Statistical inference includes testing hypotheses for the population and deriving population estimates.”, see e.g., Casella and Berger (2002).

In many situations it is unnecessary to specify the unit explicitly since it will be clear from the context, but it is not always easy to determine the unit. For instance,a circuit board contains many different components. Testing the quality of a circuit board after it has been produced requires the testing of all or a subset of the components. In this case it is not immediately clear whether the circuit board itself or whether the components are the units. In this setting the circuit board is sometimes referred to as the sample unit, since it is the unit that is physically taken from the production process. The components on the circuit board are referred to as observation units, since it is the unit that is measured. If the components were to be tested before being placed on the circuit board, however, the component would represent both the sample and observation unit. ${ }^4$

