## Model flexibility and number of parameters

This post is some thoughts I had after reading ‘Real numbers, data science and chaos: How to fit any dataset with a single parameter’ by Laurent Boué.

arXiv:1904.12320

The paper above shows that any dataset can be approximated by the following single-parameter function:

$\sin^2 (2^{x \tau} \mathrm{arcsin} \sqrt{\alpha})$

Where $x$ is an integer, $\tau$ is a constant which controls the level of accuracy, and $\alpha$ is a real-valued parameter which is fit to the dataset in question.

This might seem impossible at first, but this is an incredibly flexible function, and you can put an awful lot of information into $\alpha$ if you include enough significant figures. Have a look at the paper to see some examples of line drawings of animals, images and audio recordings, all of which are represented just by a different choice of $\alpha$.

## Too many parameters? It depends on the model

With four parameters I can fit an elephant, and with five I can make him wiggle his trunk

This quote is attributed to John von Neumann, and appears often enough that it had (wrongly) formed a good part of the basis of my understanding of model flexibility (perhaps coupled with the idea of model comparison through AIC and likelihood ratio tests). My misunderstanding was roughly that ‘lots of parameters = bad’ because you have overfitted to your data, are overestimating the model accuracy, and decreasing accuracy outside of the fitted data.

Indeed, for some models such as linear systems of equations, if number of parameters > number of data points, the data can be perfectly fit (prediction error = 0). AIC model selection works nicely here: models can be nested as they have the same structure, and you can get a measure of whether you are likely to be overfitting.

But in the above example, one parameter with enough precision can draw a line through the data: because the model is incredibly flexible.

On the other hand, some models with many parameters are very inflexible: this SIR model has almost 50 parameters, but you certainly can’t use it to draw an elephant.

In summary: the number of parameters alone does not determine whether you are likely to be overfitting a model, it also depends on the flexibility of the model itself. Be careful when comparing between models with different structures.

## Quantify everything, all of the time

I recently read the article by Wu et al in Nature Biotechnology (you can also find similar articles in pretty much all of the Nature journals) which analysed data on participants at some virtual meetings over the past couple of years, and came to the conclusion that ‘Virtual meetings promise to eliminate geographical and administrative barriers and increase accessibility, diversity and inclusivity’. Which sounds great!

Of course there are certainly some good things to come out of virtual meetings, and many unresolved issues with in person conferences. When the issues include lack of equality, increasing global warming, and giving more funding dubious publishers it’s certainly a lot easier to write in opposition to these events than support. Though pre-COVID a lot of articles criticising conferences talked about methods for reform, newer articles seem to err more on the side of totally abolishing them.

Personally, I’ve pretty much given up on virtual meetings. I’ve ‘been’ to a number of fully virtual conferences and while I certainly got something out of all of these meetings, the amount has been declining. The issues I have are:

• They’re no fun, and just the same stuff as day-to-day work/zoom meetings (which I’m pretty sure we’re all sick of).
• There’s no real way to meet people and talk about their research.
• Question and answer sessions are often chaired in a more controlling way – rarely are the critical or difficult questions asked.
• The demands on speakers seem to have become higher, with organisers both demanding a pre-recording well ahead of the deadline, as well as a live talk.
• They’re still really expensive!

## Quality not quantity

The main analysis of the paper compares attendance between 2019 and 2020 to the RECOMB conference, which became fully virtual (and also free) in 2020. Attendance grows by about 10-fold. The authors then look at ethnicity (not home country) and gender by analysing participant names. Numbers increase from all regions, the proportions change a little. The number of countries with at least one registration was greater.

However none of this more qualitative evidence which would tell us more about the quality of people’s conference experiences over the past couple of years makes it into these analyses, which are all relentlessly quantitative. All registrations are treated as equal, doubtless these range from watching a single talk to being a conference organiser.

I have some other issues with this analysis and to what extent it supports the conclusions, but really what I want to comment on is: why only do this (somewhat complex) quantitative analysis and ignore participant experiences? Why not interview some people across geographies and career levels; those who had been at conferences for a while, those new to conferences; and ask them a few questions about the virtual conference experience?

Just treating all registrations as a positive experience is bound to make free, virtual conferences look more accessible.

I did think that the discussion section of the article however was really quite reasonable:

Although we strongly believe that in-person conferences have their own benefits, and that no online communication tool can mimic the in-person experience completely, we cannot neglect the multiple advantages that online conferences offer: in addition to providing opportunities to previously under-represented groups to attend global conferences, use of a hybrid format will contribute toward decarbonizing conference travel after the pandemic.

Wu et al //doi.org/10.1038/s41587-021-01176-z

I feel that we as scientists can sometimes stick a little too closely to numbers and counts and shy away from sources of information which are harder to enter into R/python, and by doing so we can make our analysis less complete, but retain a veneer of respectability.

## Did 1.27M people die from AMR in 2019?

I would answer ‘I don’t know’. If I was being less trite, I would add that I’m more confident saying that it was between 100k and 10M – whichever way you look at it, vast numbers that are growing larger, and which require action on multiple fronts.

The authors of the study ‘Global burden of bacterial antimicrobial resistance in 2019: a systematic analysis’ (also called ‘the GRAM study’) have actually attempted to estimate this. Their paper was published a couple of weeks ago, and you can read it here: //doi.org/10.1016/S0140-6736(21)02724-0.

The authors of this study combine a lot of data from many sources (which I am sure was very difficult to collect); run a complex series of regression models, chaining their outputs together; predict DALYs lost, and run counterfactuals vs no-AMR to make predictions of the global burden of AMR. Ultimately they estimated that in 2019 ‘1·27 million (95% UI 0·911–1·71) deaths attributable to bacterial AMR’, which is unfortunately at the upper end of extrapolations from the 2016 O’Neill report.

I really want to make one criticism here: there is no way it was possible to estimate this to three significant figures, and the uncertainty interval is almost certainly too narrow. These UIs are generated from the model used, and only reflect uncertainty from the data, if the given model is true (and which probably explains the tight and smooth UIs in figure 2).

The authors themselves offer reflection on uncertainties, assumptions and approximations used in the discussion, and have a whole table in the appendix listing known modelling limitations. It’s easy to identify more than are explicitly addressed (e.g. not modelling vaccine replacement in S. pneumoniae) – some which would suggest an overestimate, some which would suggest an underestimate. This analysis is incredibly complex and tries to account for so many factors that I’m at a loss to guess whether this is more likely over- or under-estimated – but I am confident in saying the result is more uncertain than reported.

A related gripe is that reporting this level of precision makes the study accuracy look higher than it actually was. Various credulous press coverage then report only the central estimate with no UIs at all, which really does make it look like someone counted all of these deaths, and got exactly 1.27M (which was then rounded down to 3sf).

Perhaps just saying about 1-2 million deaths would have been clearer?

## Ok, maybe more than one criticism

I’m an outsider to global burden of disease estimates, and economic modelling in general, so realise that I might be missing the point with some of these questions, but I also wondered:

• Won’t the errors from each source/regression, and especially by chaining results together, compound? Was this taken into account?
• Will this result and model’s accuracy ever be checked retrospectively (e.g. when we have better data)? Hopefully at least for some regions, and then we could see what the major sources of error in the model where. But, could we do this now for a region with lots of data, by adding errors and sparsity into it? Or for one species/disease?
• The sensitivity analysis (section 4.7 of the appendix, ‘model validation’) reports 0.7-1 AUC, which isn’t exactly stunning, but it also looked like there wasn’t any real validation set used, just a random 20% subsample of all of the data.
• One press release I saw mentioned ‘celebrating the global collaboration and 100’s of data partnerships that made this study possible’ which I’m sure is a great thing to come out of this.
But, where is the data? Can anyone else use it? Has any effort been made to provide it to researchers through an access committee?
• Relatedly, where is the model code? After a bit of searching I found this repo, but I’m not sure it’s a) the right one or b) that it’s able to be reused at all.
• This is very minor, but what’s the ‘grand total’ in table 1? How does it make sense to sum sample sizes and study years from totally different sources? Seems like it’s to impress us with the figure 471300319 (which is only useful for estimating how much memory you’re likely to need to load the data).

Each year I delete all my old tweets.

• They’re difficult to search through.
• They’ve probably lost most of their context.
• You might have changed your mind, or they’ve become outdated.
• Shitposting is usually a lot less funny in retrospect.

• The dubious ability to quote tweet an old take/prediction that turned out to be true. (How many times have you seen the someone do this with a prediction that turned out wrong?).

It’s pretty easy to automate. You’ll first need to sign up for the twitter developer API, but once you’ve got a key you can use something like python-twitter to delete everything with a timestamp in a certain range. You will also archive them easily as part of this process, so you’re not really losing anything.

Blog posts are better than writing long threads out on twitter, because:

• Once you get past 2-3 tweets, threads are difficult to read (and not really what twitter was designed for). I’m sorry but I’m very unlikely to click on something that starts 🧵 1/25
• There’s no good history, partly due to the search being difficult to use, partly because of users deleting things. Many times I’ve wanted to find someone’s useful tweet thread, but have been unable to. Not a problem with blogs, which are usually indexed by google.
• You can put pretty much whatever you want in your blog’s html, and it can be as long as you like.
• No algorithm to contend with (though to be fair to twitter, it’s one of the easier ones to turn off their algorithm by switching to the ‘most recent first’ view).

The two main issues are that 1) it takes more effort to maintain your own blog (and if you don’t, it’s equally likely to disappear) and 2) no-one reads your blog, but they do read twitter.

I use twitter to post links to this blog (partly helping with #2?), but in 2022 I’m going to try and read/subscribe to blogs more, and read twitter less. Here are some blogs I’ve enjoyed, and that you might enjoy too:

## You know, I’m something of a modeller myself

I recently read a blog post written by Joel Hellewell, who worked on the COVID-19 response team at LSHTM. The post is here //jhellewell14.github.io/2021/11/16/forecasting-projecting.html. I found it particularly interesting to hear the perspective of someone who had worked on (mathematical) modelling infectious diseases for a longer time, and how the response to the pandemic compared to these activities.

I wanted to write a reply which turned out to be a bit too long for a tweet, so here it is.
(background: I spent Apr-Dec 2020 working part-time with one of the COVID-19 modelling groups at Imperial’s department of infectious disease epidemiology; my contribution was pretty much entirely to the software/programming side, not the modelling.)

## Have forecasts been useful?

While scenarios were used for policy planning in England (e.g. to guide the scheduled easing of NPIs in 2021), inference from models was used to understand the transmission process, I’m not clear whether short- or medium-term forecasts were ever used/useful for this. I would be interested to know how useful predictions for number of cases, hospital beds, and deaths have been in England. It appears to me that policy decisions happen on much slower timescales than ~weekly, but maybe the NHS is able to more quickly respond to forecast demand?

I’d also agree with the post that any longer term forecasting for most diseases is very difficult, but especially one where unexpected far-reaching policy decisions and significant adaptive mutations come into play during the forecast period.

## Time pressures

Joel’s post argues that there is little motivation to assess accuracy of forecasting results, as this kind of activity doesn’t directly feed into career success (broadly equal to publishing ‘high-impact’ papers). At least in the team I was in, my impressions was that the modellers spent a huge amount of time scrutinising their own work, comparing it to others, and generally being very thorough in their process.

[Joel responds that he specifically meant retrospectively checking predictive accuracy of forecasts against what actually happened; all SPI-M groups follow good practices when creating models, but none do these checks routinely.]

For us (me?), the pressure to publish wasn’t a dark spectre looming behind every decision, which was good, but I think it’s fair to say there was a collective sense of frustration that by late 2020 we had yet to publish our work, whereas other groups had produced multiple career-defining papers. Instead, there was a constant demand for production of further scenarios and forecasts (not generally formulated by the group themselves) which to me felt more and more like it got in the way of the science-driven side of the work.

## Capture by philanthrocapitalists and friends

Just an observation that this seems to have really ramped up in genetics too. I feel like I read a very big tech-type post at least once a week now touting the achievements of their funding and (often internal) fundees, but with very little actual scientific substance.

## Why be a scientist?

I want to be able to have at least some time to pursue things I enjoy out of interest rather than following a rigidly-defined project plan to a set time (which for me usually means doing something technical, or making a visualisation). So, something I think I personally learnt from COVID-19 was that I don’t really want to be in a public health role, or one that feeds directly into policy making.

But will this even be possible post-pandemic if you want to have a career in infectious disease research? I hope so.

Generally, I’m more optimistic that science can be what we (scientists) make of it, and we do have the ability to change this trajectory towards something kinder and more enjoyable. Particularly, younger scientists I work with almost always have positive attitudes towards issues such work-life balance, inclusivity, for-profit publishers and open science, and the means by which we quantify research output and impact.