The workshop took place at the Centrum Wiskunde & Informatica, Amsterdam, on 27 and 28 November 2015.
While standard Bayesian theory uses a single distribution to model an agent’s beliefs, imprecise probability theory uses sets of probabilities, allowing for ‘Knightian’ uncertainty. This is not without its problems (such as dilation), so there have been numerous attempts to map sets of distributions to a single, somehow ‘best’ element (sometimes called pignistic transformation). We mention maximum entropy, numerous objective Bayes methods and Fisher’s fiducial probability, largely forgotten by now but once considered on equal par with frequentism and Bayesian theory. All these methods however can give mutually inconsistent and sometimes counterintuitive results.
Here we offer a way out by representing a set \(\mathcal{P}\) by a single distribution \(\tilde{P}\) together with a specification of the random variables that it can safely predict; for other random variables its expectation under \(\tilde{P}\) will simply be undefined. This is somewhat reminiscent of the operation of making some events unmeasurable by restricting the σ-algebra, but it offers vastly more possibilities. We obtain a hierarchy of ‘types’ of probabilities.
At the bottom of the hierarchy are distributions that are ‘safe’ in the very weak sense that for all \(P^*\in\mathcal{P}\), the \(P^*\)-expected log-score achieved by predicting with \(\tilde{P}\) is equal to the \(\tilde{P}\)-expected log-score of predicting with \(\tilde{P}\), i.e. the Shannon entropy of \(\tilde{P}\) (so we are comparing a range of \(P^*\) with a single \(\tilde{P}\)). Such distributions often coincide with maximum entropy distributions; they are safe to use for data compression and sequential gambling with re-investment, with uniform pay-offs. At the top of the hierarchy are distributions that are correct in the sense that \(\mathcal{P}\) is a singleton, containing only \(\tilde{P}\). Inbetween, there is a natural place for calibrated distributions (that have correct expectations for all random variables conditioned on the expectation having a certain value), fiducial distributions (correct expectations for all random variables that are a function of the cumulative distribution of \(X\) induced by \(\tilde{P}\)), the well-known marginal distributions, and many others, often—I will claim—unwittingly used by practitioners.
Overconfidence is a well-known feature of human decision making. It shows in quoted subjective probabilities of future events being typically too extreme, much closer to the two-element set {0,1} of extreme probabilities than warranted by the available knowledge and data. A small amount of training makes the probabilistic predictions output by human forecasters almost well-calibrated and significantly improves their quality as measured by standard loss functions. Bayesian predicted probabilities are guaranteed to be well-calibrated, but only under the corresponding Bayesian assumptions. When those assumptions are violated, the Bayesian probabilities may become badly miscalibrated, again typically leading to overconfidence. In this talk I will show how to turn Bayesian prediction algorithms (or almost any other prediction algorithms used in machine learning and statistics) into prediction algorithms that are guaranteed to be well-calibrated under the assumption of randomness (the data are generated independently from the same distribution) common in machine learning and nonparametric statistics; this assumption is often much weaker and more realistic than Bayesian assumptions. The price to pay is that the resulting predictions are imprecise probabilities. It is interesting that even after these imprecise probabilities are artificially made precise by using a minimax procedure, the resulting prediction algorithms outperform the state-of-the-art machine-learning algorithms in empirical studies.
In this talk, I will discuss three examples where letting go of the requirement of precision may lead to surprising conclusions and new ideas.
The first example is decision making—finding the best action or conclusion—when the available information takes the form of a probabilistic graphical network, where a certain number of (evidence) nodes are instantiated. Typical concrete applications are classification (pioneered by Marco Zaffalon and his IDSIA group) and state (sequence) estimation in hidden Markov models (pioneered by Jasper De Bock and myself). Here, imprecision leads to set-valued answers. Such ‘indeterminate’ answers seem to have the advantage of being more honest, but there may be more to them: in some cases they may even suffice, or be perfectly adequate. Being indeterminate, or undecided, may, in other words, in some cases be preferable to being precise and wrong.
The second example is calibration. Recent work by Vovk and Shen has shown that there is a game-theoretic way to define what it means for a sequence of (precise, point-valued) forecasts to be well-calibrated—or equivalently, for a sequence of forecasts and outcomes to be ‘random’—that is equivalent to the more classical, measure-theoretic definition of randomness. Loosely speaking, calibration means that there is no (lower semi-computable) strategy that exploits the commitments implicit in the forecast and allows one to become infinity rich. In a recent collaboration with Philip Dawid, we have started looking at expanding this idea to include imprecise, interval-valued forecasts. It is very early days for any definite conclusions to be drawn from this exploration, but a few interesting first results and conjectures seem to emerge, which I’d be happy to discuss with the workshop participants.
A third example concerns stochastic process. Some recent work (by the SYSTeMS group at UGent) in so-called imprecise Markov chains has shown that it is possible to significantly relax the Markov condition—effectively making sure that an imprecise Markov chain is a collection of very general stochastic processes whose transition probabilities don’t necessarily satisfy the Markov condition, but belong to sets that do—while still maintaining ease of inference and computation, and preserving powerful limit results. This idea has great potential in engineering applications.
What are proper theories to model uncertainty? This question is central in many fields, including statistics, economics, artificial intelligence, and for empirical purposes, psychology. Many have argued that the Bayesian approach is proper from a normative perspective. Empirically, however, there are many deviations. For uncertainty, this has led to ambiguity models, which are a popular topic in the economic literature today. In economics, the importance of studying ambiguity was put forward at least since 1921 (Keynes and Knight). It lasted until the end of the 1980s, however, before for the first time decision foundations for ambiguity were found by people clever enough to do this: Gilboa and Schmeidler. Then decision theory could start its counterpart to the imprecise probabilities from statistics, the belief functions from artificial intelligence, and the nonadditive subjective probabilities from psychology. According to a minority (loud though) these theories can even be used for normative purposes.
This lecture describes the current state of the art in modeling risk and ambiguity decision attitudes in economics as the result of interactions between empirically oriented psychologists and theoretically oriented economists, leading to the modern behavioral approach. The focus will be on how descriptive findings led to deviations from Bayesianism. At several stages in history, the next step forward could be made only by empirical inputs and intuitions from psychologists. At several other stages, the next step forward could be made only by theoretical inputs from economists with advanced technical skills. Modern views on the measurement of utility, beliefs, risk, and ambiguity attitudes could only arise from the merger of ideas from all the fields mentioned. The lecture ends with speculations on future directions of risk and ambiguity studies in decision theory.
The talk examines overprecision as an instance of overconfidence, frequently hampering the subject matter impact of statistical analysis. We discuss the potential approaches based on imprecise probabilities (IP) have for reliable statistical modelling. While traditional statistics is often captured in a fundamental dilemma—either precise answer based on unjustified assumptions or no solution at all—, IP-based models promise an appropriate modelling of the quality of information at hand.
The talk is divided into three parts. First we recall some results on powerful handling of prior data conflict in generalized Bayesian inference. We then look at IP-based generalizations of sampling models. We briefly discuss several approaches and then focus on an IP model for unobserved heterogeneity. Our credal likelihood approach can be understood as an alternative to precise Bayes-like random effect models. We discuss generalized maximum likelihood estimation and quickly sketch also some generalized Bayesian inference in this situation. Finally, if time still allows, we look at some first IP-based models to handle (not randomly) coarse(ned) data in regression models.
How might a (radical) Subjectivist change her/his mind about a statistical model? Consider a setting where the Subjectivist’s statistical model is determined via personal judgments of exchangeability, through a de Finetti‐styled representation theorem. What if a surprising pattern is observed in the data and the Subjectivist considers, e.g., revising her/his opinions to a new, candidate statistical model based on personal judgments of mere partial exchangeability?
Here, I explore how such a Subjectivist might perform (Peircean) abductive reasoning. The Subjectivist identifies new, candidate statistical models based on judgments of partial exchangeability. The basis for the abductive step is the surprising added utility of a new model, weighted by its complexity. This added utility is formalized in terms of the increased informativeness about the observed data in the new model (based on partial exchangeability) compared with the informativeness about the same data in the old statistical model (based on exchangeability). Moreover, the utility assessment holds for each Subjectivist who shares the same statistical model, independent of her/his prior for the parameters of the first (exchangeable) model.
This abductive inference pattern leading to a new, candidate statistical model is distinguished from ordinary Bayesian updating, where then the Bayesian would need:
- to anticipate all the relevant partially exchangeable models, and
- to develop a Cromwellian “prior” that includes each of them in its support.