统计代写|贝叶斯分析代写Bayesian Analysis代考|Why Relying on Data Alone Is Insufficient for Risk Assessment

The last decade has seen an explosion of interest in “big data” and sophisticated algorithms for analysing such data. The popular belief is Athat, with sufficiently “big” data and increasingly powerful “machine learning” algorithms it should be possible, by using purely automated methods applied to the data, to discover all of the properties and relationships of interest for both improved prediction and decision-making. For example, such methods have been applied to large databases of supermarket customers to understand and predict the buying patterns of customers and to determine the optimal time to release new products. In areas such as healthcare the hope is that, given large patient databases, such methods can be used to understand both the causes of particular diseases and the optimum treatments. Unfortunately, in most areas of critical decision making there is limited relevant data (e.g. in medicine doctors do not always record what they do), while in other areas even very large databases will never provide the required answers. Nor does “big data” necessarily mean good quality data.

For example, a popular and important area for such machine learning is the use of “credit scoring” by banks to determine the risk associated with making loans to customers. The kind of database used by banks for this purpose is shown in Table 2.16, where each record (i.e. row) corresponds to a customer who was previously granted a loan.

Since too many people “default” on loans, the bank wants to use machine learning techniques on this database to help decide whether or not to offer credit to new applicants. In other words they expect to “learn” when to refuse loans on the basis that the customer profile is too “risky.”

统计代写|贝叶斯分析代写Bayesian Analysis代考|Uncertain Information

Consider the following assertions:

1. Oliver Cromwell spoke more than 3,000 words on 23 April $1654 .$
2. O.J. Simpson murdered his wife.
3. You (the reader) have an as-yet undiagnosed form of cancer.
4. England will win the next World Cup.
The events in assertions 1 and 2 either happened or did not. Nobody currently knows whether the assertion in statement 1 happened. Only O.J. Simpson knows for certain whether assertion 2 happened. Assertion 3 describes a fact that is either true or false. Assertion 4 is different because it describes the outcome of an event that has not yet happened.

While all four assertions are very different what that all have in common is that our knowledge about them is uncertain (unless we happen to be O.J. Simpson). In this book the way we reason about such uncertainty is the same whether the events have happened or not and whether they are unknown or not. Unfortunately, many influential people do not accept the validity of this approach. We have an obligation to demonstrate why those influential people are wrong. To do this we will consider the simple scenario in Box $2.5$ that captures the key differences between uncertain information and incomplete information.

