## 统计代写|概率与统计作业代写Probability and Statistics代考|Visualizing Data

Next to inspecting data by computing summaries, we often visualize our data. Visualization, when done well, can make large and even high-dimensional datasets (relatively) easy to interpret. As with many topics we introduce in this book, visualization is a topic in its own right. We refer the interested reader to Rahlf (2017) or Young and Wessnitzer (2016); here we merely provide some basics to get started making simple data visualizations with $R$.

Since we will be making plots, the easiest thing to do is to just call the function plot on the data object we have and see what happens:

plot (face_data)
This code produces Fig. 1.4. Note that variable id is a variable that represents the order of participants entering the study.

Admittedly, blindly calling plot is not very esthetically appealing nor is it extremely informative. For example, we can see that gender only has two levels: this is seen in the fifth row of panels, where gender is on the $y$-axis and each dot-each of which represents an observation in the dataset-has a value of either 1 or 2 . However, the panels displaying gender do not really help us understand the differences between males and females. On the other hand, you can actually see a few things in the other panels that are meaningful. We can see that there is a pretty clear positive relationship between id and dim1: apparently the value of dim1 was increased slowly as the participants arrived in the study (see the first panel in the variables as well.

Interestingly, R will change the “default” functionality of plot based on the objects that are passed ${ }^{18}$ to it. For example, a call to

plot (face_data\$rating) produces rig. 1.5, which is quite different from the plot we saw when passing the full data. Erame as an argument. Thus, based on the type of object passed to the plotting function-whether that is a data. frame, a numerical vector, or a factor-the behavior of the function plot will change. Admittedly, the default behavior of$\mathrm{R}$is not always the best choice: you should learn how to make the plots you want yourself without relying on the$\mathrm{R}$defaults. We will look at some ways of controlling R plots below. ## 统计代写|概率与统计作业代写Probability and Statistics代考|Describing Interval/ratio Variables We can now look at ways of visualizing continuous variables. Figure$1.8$shows a so-called box and whiskers plot (or box plot); these are useful for getting a feel of the spread, central tendency, and variability of continuous variables. Note that the middle bar denotes the median and the box denotes the middle$50 \%$of the data (with$Q_1$the first quartile at the bottom of the box and$Q_3$the third quartile as the top of the box). Next, the whiskers show the smallest value that is larger or equal to$Q_1-1.5 I Q R$and the largest value that is smaller than or equal to$Q_3+1.5 I Q R$. Finally, the dots denote the values that are outside the interval$\left[Q_1-1.5 I Q R, Q_3+1.5 I Q R\right]$, which are often identified or viewed as outliers. Box and whiskers plots can be very useful when comparing a continuous variable across subgroups of participants (e.g. males and females) – see Sect. 1.5.3. The figure was produced using the following code:$>$boxplot (face_data\$rating)
Next to box and whiskers plots, histograms (examples are shown in Fig. 1.9) are also often used to visualize continuous data. A histogram “bins” the data (discretizes it), and subsequently shows the frequency of occurrence in each bin. Therefore, it is the continuous variant of the bar chart. Note that the number of bins selected makes a big difference in the visualization: too few bins obscure the patterns in the data, but too many bins lead to counts of exactly one for each value. R “automagically” determines the number of bins for you if you pass it a continuous variable; however,you should always check what things look like with different settings. The following code produces three different histograms with different numbers of breaks (or bins). ${ }^{19}$
$>$ hist (face_data\$rating)$>$hist (face_data\$rating, breaks=5)
$>$ hist (face_data\$rating, breaks=50) Finally, a density plot-at least in this setting-can be considered a “continuous approximation” of a histogram. It gives per range of values of the continuous variable the probability of observing a value within that range. We will examine densities in more detail in Chap. 4. For now, the interpretation is relatively simple: the higher the line, the more likely are the values to fall in that range.${ }^{20}$The density plot shown in Fig.$1.10$is produced by executing the following code:$>$plot (density (face_data\$rating))
It is quite clear that values between 40 and 100 quite often occur, while values higher than 100 are rare. This could have been observed from the histogram as well.

plot (face＿data)

plot (face＿data$rating) 生产平台。1.5，这与我们传递完整数据时看到的图有很大的不同。Erame作为一个参数。因此，根据传递给绘图函数的对象类型——是否为数据。 .帧，一个数值向量，或一个因子-函数图的行为将改变 不可否认，$\mathrm{R}$的默认行为并不总是最好的选择:您应该学习如何在不依赖$\mathrm{R}$默认值的情况下自己制作您想要的情节。下面我们将介绍一些控制R图的方法 ## 统计代写|概率与统计作业代写Probability and Statistics代考| description Interval/ratio Variables 我们现在可以看看可视化连续变量的方法。图$1.8$显示了所谓的盒须图(或箱形图);这些有助于了解连续变量的扩散、集中趋势和可变性。请注意，中间的柱形表示中位数，方框表示数据的中间的$50 \%$(在方框底部的第一个四分位数是$Q_1$，在方框顶部的第三个四分位数是$Q_3$)。接下来，晶须显示大于或等于$Q_1-1.5 I Q R$的最小值和小于或等于$Q_3+1.5 I Q R$的最大值。最后，圆点表示在区间$\left[Q_1-1.5 I Q R, Q_3+1.5 I Q R\right]$之外的值，这些值通常被识别或视为异常值。盒须图在比较亚组参与者(如男性和女性)之间的连续变量时非常有用-见1.5.3节。$>$boxplot (face＿data$rating)

$>$ hist (face＿data$rating)$>$hist (face＿data$rating, breaks=5)
$>$ hist (face＿data$rating, breaks=50) 最后，密度图——至少在这种设置中——可以被认为是直方图的“连续逼近”。它给出连续变量的每个值范围内观察到该范围内值的概率。我们将在第4章更详细地讨论密度。目前，解释相对简单:线越高，值越有可能落在这个范围内。${ }^{20}$图$1.10$中所示的密度图是通过执行以下代码生成的:$>$plot (density (face＿data$rating))

