Mexico had a major earthquake, again, on September 19th. This is the third major earthquake on this specific date (Earthquake Shakes Mexico on Anniversary of 2 Major Temblors).
Many people in the country are in awe of this, asking themselves how could something like this happen.
In fact, the probability of three major earthquakes (over 7.0 in richter scale) happening on the same date is not as low as you would think. It is actually about 25% likely to happen. This is the same probability as getting two consecutive "heads" when tossing a coin twice.
I was able to estimate this probability with a simple Python script. I simulated (100K times) picking random dates for the 64 major earthquakes we have had in Mexico (according to Wikipedia).
But why do many people think such an event is less likely than it really is?
The birthday paradox has a similar effect. You only need 23 random people to have a 50% chance of having two of them have the exact same birthday, but most people will not think this is as likely.
A friend of mine pointed out that two of the three earthquakes had substantial casualties. About 30% of major earthquakes in Mexico have had substantial casualties according to wikipedia (defined as having 25 or more casualties). I updated my script and ran a second simulation to compute the probability of having at least 3 major earthquakes on the same date, where at least 2 of those had substantial casualties. I did this by picking random dates and then picking a wighted random flag with 30% of chances of that flag being true.
The probability of that happening - of at least 3 major earthquakes in which at least 2 of them had substantial casualties on the exact same day - was still higher than 7%. This is several thousand times more likely than my wife winning a raffle for a fully paid trip to Maui in the Four Seasons (which she won), or her winning a raffle for a free Safari trip to South Africa (which she also won!).
Sometimes people, including data scientist, need to challenge their own pre-conceptions and biases.
Can’t this calculation be done formally, without a script? Like, we’re saying that if you randomly pick a number from 1 to 365 3 times, the number of chances it is not the same day is 1-(1/365)^2. Because there are C(3,N) possible combinations of triplets, the number we’re looking for is (1-(1/365)^2)^C(3,N), right?
For N=64, I get 26%, consistent with your results. To make sure that 2 of them are major, if there is a 30% chance of any single one being major, that probability is 30%^3+3*30%^2 ~ 29%. The end-result would be 26%*29% ~ 7.9%, again consistent with your simulation.
I did not know Carla won so many raffles 😊