EMPTY STATISTICS

Narratives without statistics are blind, but statistics without narratives are empty.

(Steven Pinker, The Better Angels of Our Nature)

Reasonable people have said that one should pay attention when Nassim Taleb speaks. I don’t understand why. Some have even put him on a par with the venerated scientist and science writer Steven Pinker, calling both ‘intellectual titans’. The designation is well-deserved for Pinker, but much less so for Nassim Taleb, as his recent tactics in his war against the former prove.

A couple of years ago, Steven Pinker wrote a thoroughly researched and carefully argued book on the decline of violence, backing his theses by tons of data, virtually all pointing strongly to the same conclusion: yes, violence has declined throughout the centuries and keeps declining.

The conclusion may surprise us, given the daily reports of murder, terrorist attacks and other indiscriminate killing on TV, internet and in the newspapers; providing us with unmistakable proof of the dark side of humanity. What’s usually left out of these reports is the violence committed by our ancestors. People seem to forget that ‘violence has declined’ doesn’t mean ‘violence has gone’. It means that, whatever the current level is, it used to be much worse. And that’s what Steven Pinker, in line with historians and other scholars before him, amply demonstrates.

Keep in mind Pinker isn’t just talking about war. Violence can be in any form and on any scale. In addition to military conflict, Pinker discusses human sacrifice, homicide, torture, capital punishment, slavery, dueling, the treatment of homosexuals and ethnic minorities, wife battery and child abuse, animal cruelty,… Yes, bullfights are still cruel and pointless (and increasingly controversial), but did you know that cats were burned alive for popular entertainment in the Middle Ages? You get the idea…

Also important to note, Pinker discusses a number of parallel trends that may help explain the decline in violence. Our forefathers’ culture of honor and glory has gone out of fashion in the developed world. Democracy has gained a strong foothold. The global economy has turned international peace into a positive-sum game. The Enlightenment, humanism and the success of civil rights movements have made a lasting impact on contemporary law and morality. Technological innovation – not just the internet and TV, but much earlier: mass production of books after the invention of the printing press in the 15th century – enabled humans to gradually widen their circle of empathy from next of kin or small neighborhood to just about any being that’s capable of suffering.

In sum, the decline of violence is not a statistical fluke captured by a small number of catchy but otherwise insignificant charts. It’s a coherent theory explaining not only the charts but also our contemporary mindset and moral sensitivities; and it’s based on political realities like international law and organizations. Is the [more than 70-year] absence of war between France and Germany a freakish streak of luck, Pinker asks in a rebuttal to Taleb, or could the Long Peace be more than a statistical illusion? Even if you can’t put a number to it, the decreased probability of war inside the European Union should be obvious to everyone with an elementary knowledge of politics and history. Is it that reckless an extrapolation, then, to propose that strong international ties, be they political, economic, cultural or other, are likely to have a similar effect on a global scale? What about the freezing of national borders by international law? The abolishment of military conscription? Article 9 in the Japanese constitution, banning the use of war as a means to settle international disputes? And similar constitutional restrictions on military intervention in Germany? Is all of that, and much more, irrelevant?? To Nassim Taleb it is, apparently.

Let me first give an overview of the hostilities between Taleb and Pinker, then I’ll elaborate on Taleb’s recent statistical smokescreen.

Act one: After he – sort of – read Pinker’s The Better Angels of Our Nature, Nassim Taleb ( The “Long Peace” is a Statistical Illusion) accuses Pinker of ‘not having a clear idea of the difference between science and journalism, or the one between rigorous empiricism and anecdotal statements’, of ‘using mechanistic techniques of statistics without understanding the properties of the statistical claims’. Taleb gives a confused explanation of ‘the difference between M* and M’ (sample mean and true mean, respectively), and uses ‘journalism’ as a term of abuse for science writing that purportedly ignores the difference. He says ‘Working with M* cannot be called “empiricism”‘. Hmm, should we “work” with the unobserved M instead? And use a philosophical argument to estimate it? But Taleb is convinced that ‘what I just wrote is at the foundation of statistics’.

His comments about M and M* are quite revealing; they expose his dilettantism as well as his muddled mind. But his characterization of bland factual statements as ‘statistical fallacies’ borders on the insane. For example, in one of his Facebook posts on the “Pinker fallacy”:

‘This afternoon, to kill time on a long flight I decided to look for scientistojournalistic fallacies so I went to Steven Pinker’s twitter account. I immediately found one there. (Heuristic: go to Pinker). He promotes there a WSJ article to the effect that “Terrorism kills far fewer people than falls from ladders”; the article was written by a war correspondant [sic] […]Now let’s try a bullshit-detecting probabilistic reasoning.

A- Falls from ladder are thin-tailed, and the estimate based on past observations should hold for the next year with an astonishing accuracy.[…]

B- Terrorism is fat tailed. Your estimation from past data has monstrous errors. A record of the people who died last few years has very very little predictive powers of how many will die the next year, and is biased downward. One biological event can decimate the population.

May be “reasonable” to claim that terrorism is overhyped, that our liberty is more valuable, etc. I believe so. But the comparison here is a fallacy and sloppy thinking is dangerous.’ (emphasis added)

To a normal person, the statement ‘terrorism kills far fewer people than falls from ladders’ means a purely factual statement about the number of casualties over a specified period in the past. But Taleb immediately detects a prediction about the future and calls the “inference” a fallacy. Sloppy thinking is dangerous, I agree with him on that point. His penchant for reading things that aren’t said can lead to hilarious lecturing such as this (emphasis added):

‘[I]n the thesis by Steven Pinker that the world is becoming less violent, we note a fallacious inference about the concentration of damage from wars from ^Kq
(quantile contribution estimator, JV) with minutely small population in relation to the fat-tailedness.’ (The Long Peace is a Statistical Illusion, p.8)

Wow, did you note that inference too?? If not, you’re obviously not a member of Taleb’s cult, so consider yourself lucky.

Act two: Pinker defends himself in a rebuttal on his website. If you seriously think that Taleb has, or may have, a point against Pinker, then you should really read that paper. In it, Pinker exposes Taleb’s confusions and denies his false attributions. Regarding the purported fallacious extrapolations to the future, Pinker reminds us what he literally wrote:

As mentioned, from reading Taleb one might think that Better Angels misused complex statistics to extrapolate confident predictions about an Age of Aquarius in which major wars are impossible. […] The book explicitly, adamantly, and repeatedly denies that major violent shocks cannot happen in the future; this reticence is stated in the book’s opening paragraph and echoed in every summation.

Let me give his defense more force by quoting directly from the book (The Better Angels, p. 361-362) (emphasis in original):

I am sometimes asked, “How do you know there won’t be a war tomorrow (or a genocide, or an act of terrorism) that will refute your whole thesis?” The question misses the point of this book. The point is not that we have entered an Age of Aquarius in which every last earthling has been pacified forever. It is that substantial reductions in violence have taken place, and it is important to understand them.[…] If the conditions reverse, violence could go right back up.[…]

The statistics of power-law distributions and the events of the past two centuries agree in telling us that a small number of perpetrators can cause a great deal of damage. If somewhere among the world’s six billion people there is a zealot who gets his hands on a stray nuclear bomb, he could single-handedly send the statistics through the roof. But even if he did, we would still need an explanation of why homicide rates fell a hundredfold, why slave markets and debtors’ prisons have vanished, […]

The goal of this book is to explain the facts of the past and present, not to augur the hypotheticals of the future. […] My point […] is that the concept of scientific prediction is meaningless when it comes to a single event […] The truth is, I don’t know what will happen across the entire world in the coming decades, and neither does anyone else.

Either Taleb didn’t read these (and many other) pages, or his deliberate dyslexia is a symptom of his bellicosity.

Act three: Taleb adds a note to his ‘Long Peace’ paper about Pinker’s reply, calling it ‘ad hominem blather’. Why ad hominem? Because it was too long!! These are his exact words:

Pinker has written a rebuttal (ad hominem blather, if he had a point he would have written something 1/3 of this, not 3 x the words)

Act four: Taleb teams up with Pasquale Cirillo (Who is he? Presumably one of the many bright and hard-working fellows who’re happy to put aside scientific rigor and integrity in exchange for the opportunity to ride on Taleb’s coattails). Their joint paper ‘On the statistical properties and tail risk of violent conflicts‘ is supposed to give mathematical support to Taleb’s incriminations.

I can understand Taleb’s admirers are impressed by the obscure mathematical formulas; it’s not easy to assess the paper on its merits without a solid background in statistics. But they’d better not celebrate the triumph of their idol too soon. The more so because there are a number of strong clues that Taleb & Cirillo (TC) cut a lot of corners in arriving at their conclusion; clues that require no mathematical expertise to see them. Merely reading Pinker’s book should make one suspicious of the claims attributed to him in TC’s paper.

But if you do have a background in statistics, it’s easy to understand why Taleb shuns the formal peer-review process (‘research mafia’, as he calls it). His work contains so many flaws it wouldn’t stand a chance of getting accepted to any peer-reviewed journal that respects itself. Let’s turn to the Taleb & Cirillo (TC) paper now.

First of all, it suffers from ambiguous, flexible and inconsistent definitions. Most notably: how do you define violence? Without motivation, TC narrow the debate down to an argument about armed conflicts. They don’t say a word about any of the other manifestations of violence discussed by Pinker (see above). In the beginning of their paper, they narrow the topic further down to ‘events with more than 3000 casualties’. Why? Because of availability and reliability of historical records, they say, but also because:

‘the object of our concern is tail risk. The extreme value techniques we use to study the right tail of the distribution of war casualties imposition [sic] of thresholds, actually even larger than 3000 casualties.

That seems to be the main reason: TC love to ride their fat-tailed hobbyhorse so much that more mundane events like homicide rates, for example, aren’t worthy of their attention. Not just that, further on in the paper, they even raise the bar to 25,000 (or 145,000) casualties, for technical reasons (briefly, it allows them to introduce a ‘novel approach to apply extreme value theory’, the dual distribution). All of that is fine, but then they shouldn’t present their conclusions as a refutation of a broad thesis covering violence on many scales.

Another ambiguity involves the distinction between the likelihood of wars and their magnitudes (number of casualties). TC are vague about what exact claims are made by the cited books or articles. They devote several pages to showing that the distribution of casualties forms a power law (there’s a large number of small wars, a small number of big wars; for increasing number of casualties N, the probability of wars with more than N casualties decreases much slower than N increases, so that the probability of a highly destructive war never vanishes). They could have spared themselves the trouble, because Pinker already demonstrated and explained exactly the same pattern in great detail (‘Statistics of deadly quarrels, part 2’). TC fail to mention that, making it seem as if everything they demonstrate contradicts Pinker.

Similarly for the statistics involving the timing of wars (onset and duration). Pinker shows that wars start and stop at random – they follow a Poisson process, i.e. a random process characterized by independence between successive events: the probability that a war starts today doesn’t depend on, for example, how long we haven’t had a war anymore (independence thus refutes the naïve ‘hydraulic’ view of war, that a long bout of peace increases the likelihood of a new war). Thus, a Poisson process is said to be ‘memoryless’. A further distinction can be made between a stationary (homogeneous) or non-stationary Poisson process. When the probability (of a war starting or stopping) remains constant, the process is said to be stationary.

So, what are TC’s conclusions?

One cannot observe any specific trend in the number of conflicts, as large wars appear to follow a homogeneous Poisson process. This memorylessness in the data conflicts with the idea that war violence has declined over time, as proposed by [Pinker].’

OK, no disagreement about the Poisson process then – not sure if that’s clear to the reader. In the second sentence they confuse stationarity (= homogeneity) with memorylessness. As said, all Poisson processes share the property of memorylessness, whether they are stationary or not. Thus, the fact that the process is shown to be memoryless doesn’t contradict anything Pinker said, on the contrary. They make a similar mistake near the end of their paper: the autocorrelogram shows no time dependence among interarrival times, which they see as evidence for a homogeneous Poisson process. A more or less flat autocorrelogram indeed suggests a Poisson distribution (because it confirms the memorylessness property), but it says very little about stationarity. Pinker ‘the journalist’ makes the distinction very clear in his book, but our Distinguished Professor of Risk Engineering is confused.

Let’s put aside this technical comment of minor importance, and turn to the main point of disagreement: is the process stationary or not? By saying violence has declined, Pinker implies it’s not. Before I discuss the evidence (from Pinker) and counterevidence (from TC) on that point, I’d like to give one further indication of how sloppy TC are in their citations. In section VII, Frequency of Armed Conflicts, they say (emphasis added):

‘our data support the findings of [Brian Hayes, not Hayek as TC cite him] and [Lewis Richardson], contra [John Mueller] and [Pinker], that armed conflict are [sic] likely to follow a homogeneous Poisson process, especially if we focus on events with large casualties.’

For ease of reading, I’ve replaced the numbers in the square brackets by the author names of the cited books or articles. I’m aware that citation by numbered references isn’t unusual in the academic literature, but I can’t help noticing here it adds to the obfuscation of who said what about what. If you read Pinker’s book, then it looks very strange to see him juxtaposed with Hayes and Richardson, because Pinker cites both (but especially Richardson) in support of his theses. It was Richardson, namely, who first described the ‘statistics of deadly quarrels’ in his book by the same name. The Poisson distribution for the timing of war, and the power law for the distribution of the number of casualties: both are due to Richardson. Again, no “contra” here. What about the stationarity of the Poisson process? Now it gets a little bit complicated, but also very interesting.

Richardson did test the hypothesis of nonstationarity, and indeed, the result was negative: the data didn’t show a trend in the probability of war. It seems TC are right then, doesn’t it? Not so fast. Did Pinker overlook this? Not at all! Pinker even praises Richardson as a ‘great scientist’, for his ‘willingness to let facts and reason override casual impressions and conventional wisdom’. It’s important to know Richardson’s dataset covered a rather short time period: from about 1820 until 1950. In contrast, TC’s dataset covers 2000 years until the present, another difference (next to the restrictions on the number of casualties) that makes comparison troublesome. As Pinker notes, Richardson’s dataset was biased to show an increase in the probability of war, given that it started just after the bloody Napoleonic Wars, and finished just after World War II. The conventional wisdom right after World War II was that the chances of war had increased. But Richardson wrote this:

‘There is a suggestion, but not a conclusive proof, that mankind has become less warlike since AD 1820. The best available observations show a slight decrease in the number of wars with time. But the distinction is not great enough to show plainly enough among chance variations.’ (emphasis added)

And Pinker agrees with that conclusion. So again, where’s the “contra”? Only in Taleb’s quarrelsome mind, as usual.

However, Pinker continues, Richardson later found a statistically significant contrast between small wars and larger wars. Small wars were becoming less frequent, but larger wars, while fewer in number, were (!) becoming somewhat more frequent. Still no shred of disagreement between Richardson and Pinker, in spite of TC’s juxtaposition. The least you can say is that TC’s rendering of the cited books or articles, whether they support or criticize them, is quite disingenuous. If they refuted anything, it’s their professional integrity.

Wrapping up, the only issue left is this: what happened after 1950, i.e. outside Richardson’s dataset? As Pinker shows in the chapters ‘The Long Peace’ and ‘The New Peace’, the fall in wars between great powers and developed states after 1945 is spectacular and unquestionable (Richardson never lived to see the Long Peace). Pinker lists many other aspects of war that show a statistically significant downward trend in the post-war era.

There is only one way to ignore them all, and TC have found it: you throw away the majority of the variables that show a significant trend, and drown the others in a sea of data such that recent trends become difficult to detect. TC perform that trick by taking a 2000-year history. Under a naïve view, using a longer period of time may seem to allow more powerful and more general conclusions. But when the whole discussion revolves about what happened in a small subset of history, namely the last 70 years, then adding 1800 years of earlier data serves little more purpose than washing away recent declines by earlier increases – yes, Pinker also draws attention to the increases in earlier periods – or statistical significance in the small subset by insignificance overall.

When I say ‘recent trends become difficult to detect’, I don’t mean impossible to detect. That brings me to yet another TC trick.

As it turns out, TC bend the rules of statistical inference in their favor in yet another respect. They assume a general form for the probability distribution, estimate the parameters, verify that the model fits the data acceptably well, and then… simply rule out all competing models. Someone who’s never done any statistical modelling might ask: What’s wrong with that? Someone who learnt the basics of statistical modelling will reply: Everything!

If your model fits the data, it means just that. It doesn’t mean competing models won’t fit. What do you do if you have more than one candidate model? Simply comparing goodness-of-fit is not the answer, because the more complicated model will almost always show a better fit. There is a much better way: test the two hypotheses directly against each other. For example, split the dataset in two parts, before and after 1945, and test the hypothesis that p2 (probability of war after 1945) is less than p1 (probability before 1945). One way to do it was suggested by David Roodman: introduce a post-1945 dummy variable (a variable that has value 0 for events before 1945, and 1 after – JV) in the model and test whether its coefficient is statistically different from zero. Roodman also points out (emphasis added):

[TC’s] applications of EVT models (Pareto/power law, Lomax, Generalized Pareto) dominate the paper. But these assume rather than show that the aggregate casualty rate of warfare has held steady over the centuries. EVT models can allow for time trends but the ones used here don’t.

TC can’t be bothered; they present goodness-of-fit as evidence against competing theories. You could argue that the multinomial distribution test reported at the end of the paper is, in a way, such a test. However, it’s a very roundabout way of testing nonstationarity. Because it tests so many things at once (p1 = p2 = p3 = ….pn; pi is the probability corresponding to each subinterval), it’s probably the weakest test you can think of, if all you’re interested in is the question whether p1 = p2.

Picture this:

It’s midnight. Steven Pinker and Nassim Taleb are resting in the desert. Pinker suddenly prods Taleb:

– There’s a wild beast over there, at less than 100 feet.

– I don’t see it.

– Take your pistol, aim it at 12 o’clock, and shoot. It’s really big, you can’t miss it.

Taleb takes a handful of sand, throws it in the opposite direction (6 o’clock), and says:

– You see? I didn’t hit anything! You must be deluded, you – (expletives removed).

Anyway, if you reduce your scope to increasingly large wars, you decrease the size of your dataset, and statistical tests become weaker and weaker. Pinker himself points out the difficulties in estimating the frequencies of very rare events, and he emphatically denies that his assessment of a lower probability of large wars in the period after World War II is exclusively based on statistics. That’s why he switches to a different type of argument, namely narratives, completely ignored by TC. The political realities mentioned earlier are an example of a narrative: when someone says war between France and Germany is much less likely now than 100 years ago, they’re usually not leaning on some sort of naïve statistical inference from the number of years without war.

It’s quite ironic that Taleb, while being the first to denounce experts of any stripe for their purported ivory-tower mentality, is also the first to take the debate into a rarefied atmosphere devoid of earthly reality. The question as to whether violence has declined is an abstract probability distribution issue to him. It doesn’t occur to him that probabilities can also be modeled by relating them to explanatory variables (e.g. size of armies, level of foreign trade, political union,…). As a nice illustration of their stubborn refusal to accept any other argument, TC draw this remarkable conclusion in their paper (emphasis added):

The consequence of this analysis is that the absence of a conflict generating more than, say, 5 million casualties in the last sixty years highly insufficient to state that their probability has decreased over time, given that the average inter-arrival time is 93.03 years, with a mean absolute deviation of 113.57 years! Unfortunately, we need to wait for more time to assert whether we are really living in a more peaceful era,: present data are not in favor (nor against) a structural change in violence, when we deal with war casualties.

Their tunnel vision hasn’t escaped the attention of other bloggers. As Michael Spagat noted:

[T]he only channel that Cirillo and Taleb implicitly empower to potentially knock their war-generating mechanism off its pedestal is the accumulation of historical data on war sizes and timings. Since they focus on extreme wars, however, it will take a very long time before it is even possible for enough evidence to accumulate to seriously challenge their assumption of an unchanging war-generating mechanism.

In short, the authors declare that the risk of huge wars hasn’t really changed over two millennia of war and that they will stick with this belief until enough of one particular type of slowly accumulating evidence appears to refute it. This stance may be fine for them but other people will wish to incorporate other evidence into their judgments of the risks we face.

TC’s criteria of evidence are very similar to the gaps-in-the-fossil-record argument of anti-evolutionists, who happily brush aside an abundance of supporting data from different scientific disciplines in favor of Darwinian evolution, and yell, ‘You see, no beast!’, when a paleontologist isn’t able to produce a fossil of every ‘transitional species’ in the evolutionary tree.

David Roodman wrote an excellent summary of the fundamental flaws in TC’s paper, in a follow-up of his earlier mentioned blog post:

if you are going use statistics to show that someone else is wrong, you should 1) state precisely what view you question, 2) provide examples of your opponent espousing this view, and 3) run statistical tests specified to test this view. Cirillo and Taleb skip the first two and hardly do the third. The “long peace” hypothesis is never precisely defined; Pinker’s work appears only in some orphan footnotes; the clear meaning of the “long peace”—a break with the past in 1945—is never directly tested for.

[…]

It looks as though Cirillo and Taleb have never checked for a trend break at 1945, even though that is the clear meaning of the “long peace” assertion they claim to challenge. Perhaps they see themselves as testing for some other pattern in the time domain. But that gets back to my first point: I can’t really tell what assertion they are challenging and who, if anyone, espouses it.

And he adds:

Taleb is comically arrogant. I am a “BSer” on whom Cirillo is “wasting your precious dinner time.” Apparently Taleb often behaves this way, so I shouldn’t feel special about the attention.

                                                       ***************

The logic of academic warfare

  1. Scholar says A.
  2. Taleb hears B, takes offense and declares war.
  3. After firing a first salvo of noisy blanks and stray bullets, Taleb decides it’s time to hire a mercenary.
  4. Looking forward to fighting at the side of a famous warlord, mercenary accepts the mission.
  5. Taleb & mercenary pull out an impressive arsenal of missiles, and launch them in an all-out attack against their enemy.
  6. When the dust settles, it turns out they’ve been bombing an undisputed patch of land in the desert. None of their missiles ever came close to the target.
  7. Impressed by the huge clouds of smoke, Taleb’s followers cheer at their leader’s glorious victory.

******************

That said, my best wishes to everyone who reads this: Peace on Earth, and let the Age of Aquarius begin!!

For the twitter addicts I have one last message: be careful these days, alcohol on twitter may lead to serious injury.

Edit 01/02/2016: a reader pointed out an error in my comment about how Bayesians interpret ‘M’ (the ‘true’ mean; in the passage about the difference between M and M*). I removed the sentences containing the mistake. See the comment section for more details.