By Eric Winsberg, today at Rice:
history of philosophy Chicago
Science in the Age of Computer Simulation
editing a collection of essays on time and chance
quantifying uncertainties
computer models and in the last decade, a lot of attention on quantifying the degree of uncertainty
we know the models are imperfect
don’t give perfect models of future
technical difficulties: expensive, long time to run, lots of parameters, structural uncertainties
statistical problems
and also conceptual difficulties
so let’s try to understand why UQ (uncertainty quantification) is important
and can UQ goals be met?
UQ: can communicate knowledge from experts to policymakers, to make the experts play the kind of role that we think they ought to play
suppose we live near a glacial lake; glaciers can melt, a glacial dam can burst: should we build a dam?
the only people well positioned to answer are climate experts
and this relates to our values: how much do we value the money versus the costs (lives lost, architectural treasures lost, hard to put money value on some things)
so can climate scientists quantify the likelihood and the rest of us can decide what to do?
>> put expertise into its proper role
this kind of climate science knowledge is what interests me (Winsberg): the scope, pace, tempo, regional complexity of changes in climate
the question of whether cc is happening is answered
the cause question is answered
so when I talk about uncertainties I’m not talking about the general question of whether it’s happening
but the granularity
tool for communication to policy makers
separating the epistemic and the normative
normative questions are for all of us to decide
one central question is, can UQ do that job?
can this division of labor be maintained?
yes to the extent that UQ can do the job
only to the extent that the appraisal of scientific hypotheses can be free of “non-epistemic values”
ethical, social, political vs truth conducive ones such as simplicity, fruitfulness, scope, predictive accuracy
Rudner
no scientific hypothesis is ever completely verified << finite data
sufficient strength: 80% likely? etc
<< a function of the iportance of making a mistake
>> a scientist makes value judgments
A toxic ingredient of a drug is not present in lethal quantity
A certain lot of machine stamped belt buckles is not defective
You would think (1) requires greater sureness
Should we accept hypothesis that there will be a glacial lake outburst flood in a particular area, given future emissions?
How sure we need to be before we accept it depends on our values?
Richard Jeffrey
premise 1 that your job is to accept or reject hypotheses is false
It’s the job to assign probabilities to hypotheses with respect to currently available evidence
this assignment may involve subjective judgments but it can eliminate the role of non-epistemic values
subjective Bayesian for instance (but these judgments don’t depend on values)
UQ in climate models
how successful can climate science be in UQ in a way that is value free: the Jeffrey story on the job of science
Structural model uncertainty
Parameter uncertainty
Data uncertainty
Data uncertainty: models calibrated against the past (ice ages etc)
Ice cores and tree rings are not thermometers, not perfect records of the past
Even if we knew what the perfect model looked like, they have lots of parameters
We are uncertain as to their value (such as g for gravity); but some are there to fill in gaps in the fact that our simulations are discrete and nature is continuous
e.g. cloud formation << humidity
our best models are 1km wide; we don’t know the parameter value
And we don’t even know what the ideal model structure is
There are 6 major models
Given different emission scenarios all models predict different things, because they have different structure
Model climate in slightly different ways
There is some uncertainty << model structure: how much? Is the question
So let’s just stick with model structure (we can make similar points about other uncertainties)
One way is by using ensemble estimates
Get mean projection
Use standard deviation
>> Average thing you expect (e.g. the curve plus or minus two standard deviations)
Tebaldi and Knutti (2007)
UQ << different ensembles are wildly different
Problems with ensemble averages:
1 Assume all models equally good. Or you can weight them but how?
2 Assume models are independent
3 Ignore that models have shared histories that are hard to sort out
4 Ignore the herd mentality about success
(humorous example about measuring a barn)
history of the measurement of the speed of light: the value that seemed right would stick for a while (because of herd mentality)
Knutti et al 2010: “Defining performance metrics that demonstrably relate to prediction skill remains a largely unresolved problem.”
Cleckler et al 2008: “Forming a single index of model performance, however, can be misleading...”
Allen 2008: “tuning their flagship models to fit the same observations...eventually they will also converge”
Multi-model averages << contingent history of model development, which involves value judgments
Models have been optimized to particular purposes and particular metrics of success
Model choices have reflected balances of inductive risk
Model choices have been made “consciously or by natural selection” to follow the herd
These averages do not meet the Rudnerian standard for objectivity
Heather Douglas
scientists often have to make methodological choices that do not lie on a continuum
e.g. choosing between two possible staining techniques
methodological choices usually involve a complex mixture of value judgment and epistemic expertise
>> can experts make the right judgment after the fact?
>> compare classical T-tests to Bayesian approaches
It’s not that scientists can make totally correct judgments
Could it be that scientists can pick one stain or another and have a good sense of the data they get and make a judgment about the probabilities in a value neutral way
Methodological choices doesn’t necessarily imply that their results are value laden
Unless they are doing rote testing after the fact
Bayesian response to the Douglas challenge
It is no threat to the value neutrality for scientists to make value-motivated discreet choices
As long as probabilities can be evaluated later by experts in a value neutral way
Climate models could make this challenge unsuccessful:
1) Size and complexity
2) Distributed epistemic agency
3) Inscrutability due to generative entrenchment
for instance, consider NOAA’s GFDL CM2.x model:
contains 1 million + lines of code
thousand parameter options
hundreds of initialization files
incomplete documentation
tens of terabytes of data per model year run
weeks to produce one model run to 2100 on the fastest clusters
complex and interactive modularity
(ocean atmosphere core, then modules for precipitation, carbon sinks and so on and so on)
climate models reflect the work of hundreds of researchers working in different physical locations and at different times
Distributed over time as well as discipline
Legacy codes eliminated, but the science that was credentialed for building the modules has been preserved, and procedural decisions; we can’t make them fresh every time we build a model
>> Melinda Fagan on group knowledge (is there group justification? and so on)
>> group authorship: can such a group be epistemically accountable for what they do?
No one has a good clear view of the entirety of what they are doing
generative entrenchment (Winsberg’s paper): the historical path dependency of models; as you make choices you lock in successful features in a way that you can’t get analytic penetrability on
analytic impenetrability
the ability to attribute the various sources of success and failure of different models, in reproducing known data, to particular model assumptions extremely difficult or even impossible
AI makes epistemically inscrutable the effects of past methodological choices
Path dependency of the models: late in their history no one has a clear view of how the choices have affected the success and failure of the model
if the models are huge and complex, developed by distributed groups, and at the end of their history you can’t see how they succeed or fail based on the choices you made
>> implausible that experts can dispassionately assess impact of judgments on uncertainty
Standard Rudnerian values (choose between options on basis of inductive risk)
Predictive preference (on basis of trade offs between importance of different prediction tasks)
difficult to disentangle how methodological choices have impacted
Two different visions
An old example: climate change versus global warming
<< heating versus crop failure (<< precipitation)? well you can’t decide
In fact: methdological choices are buried in the historical past under the complexity, epistemic distributiveness, and generative entrenchment of climate models. I am:
not in the business of making historical, sociological, or psychological claims
avoid attributing to the relevant actors any psychological motives, nor any particular specifiable or recoverable se of interests
for many of the same reasons that these method choices are immune from BRDC, they are also relatively opaque to us from a historical, philosophical and sociological point of view
The main claim: any attempt to rationally reconstruct past methodological choices could only justify them against some set of predictive preferences and some balance of inductive risks!
can’t be about specific values having played a role
as a matter of logic
values in the nooks and crannies: buried in vastly complex models
commonly thought that scientists ought to
* be more self-conscious in their value choices
* ensure that their values reflect those of the people they serve
* implement some system for determining public opinions
But none of these will work. Precisely when the values have to be in the nooks and crannies and none of these strategies are available...is when you have to worry!
No comments:
Post a Comment