## 统计代写|统计与机器学习作业代写Statistical and Machine Learning代考|Model the Data

The fourth step is the analytical model development itself. Some say that this is the most important part, or at least, where data scientists have more fun. Here they will use their creativity and innovation skills to try out multiple analytical approaches to solve the business problem. As stated before, data science is a mix of science and art. This step is the time when data scientists apply both the science behind all algorithms and the art behind all analytical approaches.
Some questions to consider at this phase are:

• Which model had the highest predictive accuracy?
• Which model best generalizes to new data?
• Is it possible to validate the model? Is it possible to test the model? Is it possible to honestly test the models on new data?
• Which model is the most interpretable?
• Which model best explains the correlation between the input variables and the target? Which one best describes the effects of the predictors to the estimation?
• Which model best addresses the business goal?
This is the data scientists’ playground, where they use different algorithms, techniques, and distinct analytical approaches! Yes, a lot of the modeling process involves simply trying new algorithms and evaluating the results. Data science differs from some exact sciences, like math and physics, where based on a robust equation and inputs, it is possible to predict the output. In data science, the set of inputs might be known, but the exact subset of predictors is still unknown until the end of the model training. The equation is created during the model training according to the input data. Then the results are revealed. Any change in the input data set implies a change in the output. Therefore, data science is very much tied to the statistical and mathematical algorithms. However, all the rest is art. Furthermore, many models are not robust as they should be. Some models or algorithms are very unstable, which means every training data set might represent a different result.

Maybe this is the fun part. In this phase, data scientists try to fit the model on a portion of the data and evaluate the model’s performance on another part of the data. The first portion is the training set. The second one is the validation set. Sometimes there is a third portion called the test set. It should be noted that sometimes the best model, depending on the business goal, is the most interpretable and simplest model, rather than the one with the highest predictive accuracy. It depends on the business goal, the practical action, and if there is any regulation in the industry.

## 统计代写|统计与机器学习作业代写Statistical and Machine Learning代考|Provide an Answer

The fifth and last step is to provide answers to the original questions, the ones raised and validated during the first step. Some pertinent questions are:

• What lessons were learned from the trained models?
• How do the trained models answer the original questions?
• How do the trained models tell a story and support business decisions?
• How can the trained models be used and deployed in production in the appropriate time frame to support the required business actions?

For example, can the trained model support a campaign that targets customers with the highest probabilities of churn and offer them incentives to keep them using and consuming the company’s products and services? Can the trained model support a fraud detection process in real time to identify possible fraudulent business transactions? The model’s results might be very accurate, but to benefit the organization, the model should be deployed in an appropriate time frame. For example, in cybersecurity, if the model does not generate real-time alerts in a way that fraud analysts can take immediate actions, then the model might be useless, since digital attacks must be identified within seconds, not weeks or months.
Once an answer is provided, it might generate more questions regarding the business problem. Therefore, the data science lifecycle is cyclical as the process is repeated until the business problem is solved.
The entire analytical process and the data science approach can be viewed as a dynamically evolving flow as shown in Figure 1.4. In data science, the more complex the analytical task, the more value added to the business. For example, a simple query report can add value to the business by simply illustrating the relationships in the data, showing what happened in the past. It is very much descriptive in the sense that nothing can be done to change that historical event. However, awareness is the first step to understand the business problem and aim for an analytical solution.

Data exploration analyses can add further value to the business with more complex queries to the data. Multi-dimensional queries can help business analysts to not only understand what happened, but why it happened in that way. Analyzing the historical data under multiple dimensions at the same time can answer many questions about the business, the market, and the scenarios. Data mining, analytics, or data science, regardless of the name, it is a further step to gain knowledge about the business. Some analytical models explain what is going on right now. Unsupervised models such as clustering, segmentation, association analysis, path analysis, and link analysis help business analysts understand what exactly happens in a very short time frame and allow companies to deploy business actions to take advantage of this knowledge. Furthermore, supervised models can learn from past events and predict and estimate future occurrences. Data science in this phase is basically trying to know what will happen in the future. This is very similar to econometric and forecast models trying to foresee what will happen soon with a business event.

• 哪个模型预测精度最高?
• 哪个模型对新数据的泛化效果最好?
• 是否可以验证模型?有可能对模型进行测试吗?有可能在新的数据上诚实地测试模型吗?
• 哪个模型最容易解释?
• 哪个模型最好地解释了输入变量和目标之间的相关性?哪一个最能描述预测因子对估计的影响?
• 从训练的模型中学到了什么?
• 经过训练的模型如何回答原始问题?
• 经过训练的模型如何讲故事和支持业务决策?
• 如何在适当的时间范围内使用和部署经过培训的模型，以支持所需的业务操作?

