top of page
Search
  • Mark Jarvis

The Weatherman's Dilemma: Accuracy, Timeframes, and Predictability in Scouting

Time Travel and the Meteorologist


You turn on the weather channel and there’s a meteorologist talking about today’s forecast, but instead of a fancy green screen projecting the weather radar he’s standing in front of the window. He peers out the window and observes for a second or two.


“As you can see, it is currently raining outside,” he says.


Okay, that was pretty easy. You lean and look out your window. Sure enough it is raining.


Let’s step back a little in time. You turn on this same channel ten minutes ago, but there’s no rain outside of the window. The meteorologist grumbles and switches to the green screen instead, and the weather radar shows that rain is coming your way.


“As you can see, while it isn’t currently raining, it should be raining here in about ten minutes.”


Not quite as easy as just looking out the window, but the forecast is still accurate. Rain is on the way.


Let’s step back 24 hours. The meteorologist is studying the weather radar. While there is some potential that rain won’t hit, he says that it’s incredibly likely, and his track record is around 96-98% for predicting rain a day in advance.


“I’d advise that you bring an umbrella with you tomorrow,” he says.


We’re rapidly moving backwards in time now. It’s seven days prior to the predicted day of rain, but the meteorologist is already looking ahead. His accuracy is taking a serious dive now. He’s only around 70-80% accurate at this point. The models he is using are incredibly advanced, he’s an expert in his field, and yet the number of variables that could affect this potential rain are growing every step we take further back in time.


“It looks like this dry spell is going to end in about a week, but we’re not quite sure.”


Now it’s time to get in our jumbo-sized time machine instead of our portable one that we’ve been using so far. This one will let us go a year back.


We step out of the time machine and turn on our TV for one last time. The meteorologist looks dumbfounded at what his producer has just asked him.


“You want me to predict if it will rain WHEN?”


He thinks about it for a moment as his producer insists that he anticipate the weather 365 days in advance.


His guess is borderline useless as far as the specifics. He can’t predict whether or not he’ll even be at this station in a year from now. But he’s fairly clever. He just guesses based on how many days it usually rains each year.


“Um. I guess there's a 20% chance it rains.”


We step back in our time machine, go home, and open the door to a light drizzle.


Cloudlike, Clocklike, and Domain Difficulty


Was the meteorologist making good or bad predictions? Were any of them worse or better than any other? At first glance, the answer seems obvious. He was right about the rain in the near future and he was wrong about it when he guessed there was a 20% chance a year in advance. That said, the gap of time and information in this makes these near future and far future predictions fundamentally different. Both can be quality predictions even if one resolves incorrectly.


When we think about forecast accuracy of any kind, we have to take into account exactly what type of problem we are dealing with. In his book Superforecasting, Phil Tetlock breaks this down into two types of problems on each end of the spectrum, and then looks inwards towards the “Goldilocks Zone” of what can be predicted. Here are the ends of the spectrum.


Cloudlike: This is what our meteorologist faces when he is asked to deliver an accurate prediction about the weather on a particular day a year in advance. The vast knowledge of information and processing power we’d need to have to predict the formation of specific clouds, fronts, and other necessary weather conditions to produce rain at a given place or time a year in advance is basically impossible to achieve.


Here is the example from Tetlock’s book about the type of fine grain details that can throw off the predictability of any forecast. Bolded for emphasis - you may have heard of this “butterfly effect”.

“In 1972 the American meteorologist Edward Lorenz wrote a paper with an arresting title: ‘Predictability: Does the Flap of a Butterfly’s Wings in Brazil Set Off a Tornado in Texas?’ A decade earlier, Lorenz had discovered by accident that tiny data entry variations in computer simulations of weather patterns—like replacing 0.506127 with 0.506—could produce dramatically different long-term forecasts. It was an insight that would inspire ‘chaos theory’: in nonlinear systems like the atmosphere, even small changes in initial conditions can mushroom to enormous proportions. So, in principle, a lone butterfly in Brazil could flap its wings and set off a tornado in Texas—even though swarms of other Brazilian butterflies could flap frantically their whole lives and never cause a noticeable gust a few miles away. Of course Lorenz didn’t mean that the butterfly ‘causes’ the tornado in the same sense that I cause a wineglass to break when I hit it with a hammer. He meant that if that particular butterfly hadn’t flapped its wings at that moment, the unfathomably complex network of atmospheric actions and reactions would have behaved differently, and the tornado might never have formed…”

Here are some cloudlike problems.


“What state will the president elected in 2060 hail from?”

“Who will be the #1 overall pick in the 2034 NFL Draft?”


Here’s the other end of the spectrum.


Clocklike: This is what you would face if I were to throw a ball at you, for the most part. Gravity, wind, and the velocity of the throw over the course of a second or two makes it a relatively predictable affair, in the same way the hands of a clock will tick steadily forward. Another example of a clocklike domain would be our predicting of Halley’s Comet to return right on schedule in 2061 and 2134, or the precision of a bishop crossing the chessboard diagonally. Few variables to interact with and near perfect information can make these sorts of things near perfectly predictable.


This clocklike view was regularly held by many thought leaders in past centuries, but the development of chaos theory and more advanced practices in prediction have improved our knowledge of how truly predictable things are. The gap between these ideas dives a bit into the edges of philosophy and religion as well, but it’s worth mentioning.


In 1814, French polymath Pierre-Simon Laplace described what came to be known as “Laplace’s demon” which would, if it knew the location and momentum of every atom in the universe, be able to predict any future outcome indefinitely forward. This inherently deterministic view of the future differs from much of what scientists today believe, but it is the root of a “clocklike” view of the functions of the universe. Here’s the passage from Laplace.

“We may regard the present state of the universe as the effect of its past and the cause of its future. An intellect which at a certain moment would know all forces that set nature in motion, and all positions of all items of which nature is composed, if this intellect were also vast enough to submit these data to analysis, it would embrace in a single formula the movements of the greatest bodies of the universe and those of the tiniest atom; for such an intellect nothing would be uncertain and the future just like the past would be present before its eyes.” - Pierre Simon Laplace, A Philosophical Essay on Probabilities

Here are some clocklike problems of the kind Tetlock discusses in his book - ones so simple that a rule of thumb or “common sense” could probably get you through it.


“Will the sun rise tomorrow morning?”

“Will Tom Brady be inducted into the NFL Hall of Fame?”


Lastly, we have what Tetlock calls the “Goldilocks Zone” of predictability. These are questions that aren’t impossible to accurately predict, nor questions that are capable of being calculated if you have the knowledge and a big enough calculator to do it. These questions strike the right balance - they use the available information we have to forecast things that we care about.


A computer couldn’t answer them in the way it can stomp the competition in chess or predict astronomical events, and you couldn’t necessarily pull a regular joe off the street to nail these questions either. They sit at the intersection of value, time, and ability to be solved. This is the spot where insightful and knowledgeable experts can make meaningful predictions about future outcomes, assuming they have the right toolkit and knowhow to use it.


Not All 50/50's Are Equal


We’ve discussed the meteorologist inaccurately predicting 20% and accurately predicting 100% at different points in time, so let’s zoom in on what it means when problems land in different domains.


Suppose we were to take an individual and ask them to compete in two different domains. In the first, we ask them to accurately predict whether a fair coin flip will land heads or tails. If we were to ask them to predict the flip 10 times, we may be slightly surprised as they accurately predict it 7/10 or 3/10 times. We know the inherent probability is 50% to land on either side though. As we continue to ask them to flip, they will continue to gradually move towards 50% regardless of whether they are predicting heads or tails.


This individual has exhaustively flipped the coin 50,000 times, and whether they guess only heads, only tails, or some complicated variation of the two they will inevitably land right around the 50% mark.


Let’s assume we take this individual and have them play chess instead. They win 50% of their games against a particular AI opponent, and the AI opponent’s skill level does not fluctuate throughout the competition. Let’s assume they play less than 50,000 games (as this would raise their skill level and up their win %), but that they play enough to where we can reasonably say that we know they will continue to win 50% of their games in the short term future.


Are these two 50% rates of success comparable?


I’d argue that they aren’t. Michael Mauboussin’s Luck-Skill continuum does an excellent job of showing the influence that luck can have on our relative success rates, and it applies perfectly for this comparison between 50% in chess vs. 50% in coin flipping.


Mauboussin’s explanation for how we can determine whether or not an activity has skill is an excellent one, so I’ll use his words here instead of mine.

“There’s a simple and elegant test of whether there is skill in an activity: ask whether you can lose on purpose. If you can’t lose on purpose, or if it’s really hard, luck likely dominates that activity. If it’s easy to lose on purpose, skill is more important.”

This individual, regardless of their efforts, will never be able to predict the outcome of a fair coin landing heads or tails over a large sample size. They could consult coin flip gurus, study the coin carefully, write down all the past trends of prior flips, sample the metal and produce a chemical analysis on all of its properties, or any variation of complicated attempts to assess which side the coin will land on. It would do nothing. They will gradually narrow their outcome range closer and closer to 50% for all of eternity.


On the other hand, this individual could go many ways in chess. If they decided to simply sacrifice pieces to the opponent, make deliberate errors, and sabotage their win percentage they could do it. They could win 0% of their games if they really wanted to. They could also, through a variety of methods, improve their win percentage above the 50% mark. Through practice and experience to develop their intuition, studying of experts who play at a high level, and other methods that I’m not savvy to, they could surpass 50% and eventually beat that same AI opponent every time.


Why does this matter?


Because every time someone says that the draft is random or a crapshoot, it’s not an accurate reflection of what the underlying difficulty of prediction is within the domain.


Predicting the outcomes of where players should be drafted is different from predicting where they will be drafted. Predicting where players will be drafted is different from predicting how the first year of their career will unfold. Predicting how the first year of their career will unfold is different from predicting how their first contract will unfold. So on and so forth.


As you change the time constraints, the availability of information, and the specific question you are looking at you will see significantly different “hit rates”, but it doesn’t mean that a prediction with a lower hit rate always has less quality. It would be like saying a 50% win rate for a chess grandmaster competing against other grandmasters is equivalent to our amateur beating the AI at the same rate.


Targets and Time


Accurately labeling what we aim to predict is the first step towards being able to find something resembling a hit rate. When we say we aim to predict the success of a player over the course of the next three years, there is a different pool of information that we have to pull from than if we were to simply attempt to predict where they are going to be drafted.


We also have to accept that there are inherently unknowable things that will occur between the time of our prediction and the expiration date on it. We can update our prediction accordingly, but it doesn’t mean our original prediction was a poor one. David Ojabo was a first round pick in February 2022 and then went in the second round after tearing his Achilles. Any prediction of Ojabo’s draft slot or future success prior to that injury may have accurately reflected his odds, but new and unknowable information likely dramatically changed the final result.


The role of time is often largely overlooked when it comes to the ability to make accurate predictions and have a quality hit rate, but it is one of the most important aspects to determining whether or not we’re striking out with our swings. One way to view the impact of time is through not viewing time as something that will change the outcome, but rather as a vehicle for uncertainty (and that the uncertainty can present things that will be changing that outcome).


Think of it like this. You’re an archer lining up to fire your arrow downfield. Your target is about 60 yards away, but you’re a capable archer who can hit this target relatively well when no variables affect your shot. You draw back, let your arrow fly, and at about the 40 yard mark a slight gust of wind knocks your arrow off course. You just narrowly miss the target. What happened?


It wasn’t through a failure of skill that you missed your target. Had your target been at the 30 yard mark, you’d have nailed it. But because your shot was longer, because the gust of wind hit at a certain point in time, and because there are variables outside of your control, you missed.


To have aimed differently prior to the shot would have been to aim inaccurately to begin with. It would be through an act of luck that you hit, not an act of luck that you missed. If the gust of wind is truly unknowable, then you cannot account for it in your aim.

Working backwards in time to find out what we could have known helps us understand the true quality of our decision. Suppose our arrow has just missed. At the moment, it makes sense to be frustrated or confused about the result. But if we rewind anywhere behind the 40 yard mark, we were completely happy with our shot. We had done everything right and our arrow was on target.


One interesting example of this gust of wind in the wild is the emergence of Baker Mayfield as the first overall pick in his draft class over Sam Darnold, who seemingly led wire to wire until the day before the draft. The first night of the draft was on April 26th. Here’s a tally for how many times each of these players was mocked at #1 overall to the Browns each day prior to the draft. (Source)


April 23rd

Darnold - 9

Mayfield - 0


April 24th

Darnold - 9

Mayfield - 0


April 25th

Darnold - 18

Mayfield - 1


April 26th

Darnold - 17

Mayfield - 6


If you were betting on Darnold as the top quarterback you probably felt pretty confident until around Thursday morning. Even with the news of Mayfield’s potential surge, there was still an overwhelming majority of picks for Darnold on the day of the draft. If you had sampled teams across the league, Darnold likely would have been the top pick in the class based on all prior information available.


All it takes is one team to blow that up though, and the arrow took a last second diversion.


Dealing in Chance


One of the most insightful books I’ve read (or listened to) in the past year is Leonard Mlodinow’s The Drunkard’s Walk, and one piece that particularly stuck with me was his deconstruction of an abnormally long streak by a stockpicker.


Bill Miller, a portfolio manager at Legg Mason Capital, beat the S&P 500 for 15 years in a row. At face value, this seems like an impressive feat. It was the talk of the financial world that Miller’s streak lasted so long, as it had not been done before. The aforementioned Mauboussin thought so as well, calculating the odds of Miller’s feat at 1 in 2.3 million, which is an outstanding number for any individual.


Mlodinow viewed the situation in a different light though, instead asking the question of what the odds were of anyone, not just Miller, having a streak of that caliber by sheer chance. He calculated the odds of anyone matching Miller’s streak from 1991 to 2005 at only 3%, but that if you were to view it as “any possible 15-year period” rather than strictly those years, the number rises to 75%.


What does that mean? It means that, if you have enough people picking stocks, you’re going to get a person or two that looks particularly accurate when they are the beneficiary of both luck and a LOT of stockpickers participating. That’s not to downplay Miller’s feat. It’s to emphasize the role of chance in any particular outcome that occurs.


To use the shooting arrows example again, imagine now that you are joined by ten other people firing at this target. You may miss due to the gust of wind at the 40 yard mark, but a person or two might hit. Add more people and more hits will come. By simply increasing the number of shots taken, there’s a good chance that someone will continue to hit even when gusts of wind pull others off course. It’s not necessarily a feature of unique skill on their part, but instead simply the total number of people firing arrows.


Another interesting example of this would be March Madness. Over 36 million Americans fill out a bracket each year, but we hyperfixate on the several brackets that are still alive after the first round. It’s never some brilliant mathematician or media expert who has the last bracket standing. It’s almost always some random name and numbers combo ESPN user who nobody has ever heard of that simply got lucky when a few games broke the other way. This is a function of the same “shoot enough arrows and something will hit” idea.


The issue is that our brains don’t like chance and randomness in outcomes. When we make a prediction about the future, we view it a lot more like the smashing of the wineglass than the complex and unpredictable result of butterfly effects. We assume that our actions have more cause in the “cause and effect” loop than they usually do. We assume we have greater knowledge and control over things than we usually do. And the story of Bill Miller sounds much better as “stockpicking maestro continues 15-year streak” than “beneficiary of great luck continues to benefit for another year”.


This desire to link cause and effect can create some issues for us when evaluating how we would perform relative to chance. In the same way that a native tribesman can truly believe that his rain dance influenced the rain’s arrival, we can believe that the result of a player failing as a pro is something that we saw clearly and accurately predicted ahead of time, even if the player’s career could have turned out wildly different with another team, without an injury, or any other number of factors that could have affected him.


Sometimes it’s hot, dry, sunny, and the weatherman calls for it to stay that way. But the loud-mouthed neighbor who doesn’t know the slightest bit about meteorology yells, “It’s gonna rain later! I know it! I know it!”


It might end up raining later, but if you start taking all your advice from your neighbor on whether or not to bring an umbrella you can expect a lot more days of getting soaked.


Accepting the Weather and Closing Thoughts


One of the most difficult things to come to terms with, at least for me, was the idea that I will never be perfect in predicting how the careers of players turn out. I always thought that just by digging a little harder, watching a little bit more, or trying to hone my eyes that I’d be able to essentially bring a crystal ball to the table in making predictions. The unfortunate reality is that there is simply too much information, both hidden and unhidden, and too much calculation that would need to be done to resemble anything even close to Laplace’s Demon.


It’s difficult to accept that error is unavoidable and part of any prediction, especially when you zoom to different points on the timeline where more or less information has revealed itself. I spent the last two years watching players off their junior tape and have a large handful of players that just baffled me as far as the hype that took off and how they went from a PFA type for me to firmly draftable on a consensus media board. I’d like to think that much of it is an error on my part that can be improved, but I’m sure some of it is the wind blowing downrange.


I’ll leave you with some of the overarching points that I wanted to express in the article, the ones that prompted me to write this at all.


The time when you make a prediction is intrinsically linked to the potential accuracy of that prediction.


There are some problems that are inherently unpredictable because of their characteristics and domain, and that means that a 50% hit rate in one area won’t always be equal to a 50% hit rate in another.


Scouting and drafting has skill and is not random/a crapshoot, as it can be expressed by the “can you lose on purpose” test.


Making predictions is like shooting arrows. The longer your shot, the more variables can affect your accuracy.


A prediction that turns out to be true can still be a poor quality prediction, assuming you used improper information or had lucky timing.


Don’t forget the role that chance plays in both your predictions and the predictions of others.



Resources Used/Quoted


Comentários


bottom of page