top of page
  • Mark Jarvis

Selection Pressure: Evolution in Roster Construction and Player Outcomes (Part 2)

Due to the length of the article it has been broken into three separate posts. You can find part 1 here and part 3 here.

Table of Contents

Part 1

1. A Few Thoughts

An introduction.

2. The Most Boring Game

Using a hypothetical league’s field dimensions and player archetypes to illustrate how the environment shapes those within a system.

3. Less Boring Games

Real life examples of different leagues and the subsequent changes in the game that arise from their varying rules.

4. Changing Chess Boards

Illustrating the importance of field dimensions by showing changes in a chess game if the board were to change.

5. What is Evolution?

A brief definition of evolution and its application to the rest of the article.

6. Fitness and Environmental Pressures

Defining fitness in the evolutionary sense through beetles.

7. Beaks and Population Shifts

Showcasing types of selection through the evolution of a finch population in the Galapagos.

8. The NFL’s Archean Eon

Reflections on the league's history, both in its rules and players.

9. The Small Supply of Big People

On the implications of a scarce supply of players with certain physical attributes.

Part 2

10. Rat Islands and the Species-Area Relationship

What island biogeography tells us about diversity and niches.

11. 53 Forms Most Beautiful

On adaptive radiation and the development of new species.

12. By Generation Not Time

Viewing evolution through the lens of generations and mutations rather than by days, months, or years.

13. Replaying the Tape of Time

How small differences in the past can create vastly different outcomes in the present.

14. Potters and Pigeon Fanciers

Artificial selection and its comparability to team selection processes.

15. Marble Racing, the Matthew Effect, and Genetic Drift

How cumulative advantage and randomness influence the population of players.

Part 3

16. Correlated Traits and Spandrels

On measurements that relate to each other and their value in success.

17. Fitness Landscapes

Using the metaphor of fitness landscapes to map player populations.

18. Going the Way of the Dodo

On the extinction of certain types of players.

19. Peacocks, Oxpeckers, and the El Farol Bar

The league as a complex adaptive system and the difficulty of understanding the whole through its parts.

20. Application and Closing Thoughts

Putting these ideas into practice and reflection.

Rat Islands and the Species-Area Relationship

“How different, after all, is a patch of British woods in a sea of agriculture from a patch of rock and dirt in an actual sea? And don’t the medians in the middle of the road on Broadway in Manhattan form a sort of archipelago amid the seas of glass and cement?”

- Rob Dunn, A Natural History of the Future

The mosquitoes in the London Underground railway system don’t see the ones above ground much, although they are distant relatives of each other. Think of their intermingling as one similar to an occasional meeting at the great mosquito Thanksgiving dinner.

The near ancient subway system began operation in 1863, and it didn’t take long for mosquitoes to migrate into the subterranean environment. The underground species has evolved since then to the degree that they are now arguably a separate species from the aboveground species. To quote Rob Dunn’s A Natural History of the Future, “The aboveground species is adapted to feeding on birds. The belowground species is adapted to feeding on mammals (humans, rats, and the like). Females of the aboveground species require blood in order to lay eggs; females of the belowground species, where food is scarce, do not.”

He further discusses these differences along with comparable examples of isolated populations evolving into new species within unique niches. Speciation (the evolution of distinct species from an original population) is, of course, an imperfect comparison for what happens when players of many types pass through various levels of athletic competition. There is no true direct transmission or mutation from one successful generation to the next given that every player exists in a somewhat stable type bestowed upon them by their parents. But the study of these examples of speciation within isolated populations can help us to understand the types of niches that open up in any given environment and the process of certain types filling them.

It is not just underground and aboveground mosquitoes developing into new species to fit their niche. Any geographical or reproductive barrier can form unique species adapting to their environments. First, we’ll discuss rats, then we’ll discuss how E.O. Wilson and Robert MacArthur’s work on island biogeography relates to Mauboussin’s towns and the NFL.

In his book’s chapter called Urban Galapagos, Dunn applies this idea of island biogeography to land masses of all forms, from actual islands to patches of urban greenery or the interiors of households. For species that have shorter lifespans and generate more quickly, there is a much stronger likelihood of genetic recombination and mutation taking hold quickly within a population. When it does take hold, speciation occurs in the same way underground mosquitoes diverge from their aboveground relatives.

"Rats are not the most likely group of organisms to evolve new species in cities. They have faster generation times and move less than do coyotes, but they are hardly snails. Yet recent research by my friend and collaborator Jason Munshi-South has shown that in some regions, geographically separate urban Norway rat populations are already diverging from one another, becoming ever more distinct–almost certainly as a function of the specifics of their cities, their climate, the available food, and other details.

This is true not only for rats from very widely separated cities–such as rats from Wellington, New Zealand, relative to those from New York City–but also for rats in different cities in the same region. Munshi-South has recently shown that the Norway rat populations of New York City are closely related and show almost no evidence of breeding with Norway rats in nearby cities.

More than that, the rats at one end of Manhattan appear to be diverging from those at the other end. Norway rats are less likely to travel through, eat in, mate in, or live in Midtown Manhattan, perhaps because Midtown has a lower density of permanent human residents than the other parts of Manhattan and hence also less of the rat food such inhabitants graciously, if inadvertently, provide. Whatever the reason, Midtown is, from a rat’s perspective, a kind of sea between two lovelier islands. Similarly, Norway rats in one part of New Orleans are isolated from those in another by waterways and are diverging as a result. Norway rats in parts of Vancouver, on the other hand, have become separated from those in other parts of Vancouver because of difficult-to-cross roads."

Among his figures explaining this phenomenon, Dunn includes this excellent depiction of Manhattan by its green space and non-green space. For an animal with low mobility that lives around Inwood Hill Park at the northern tip of Manhattan, the far southern tips of green space like The Battery and East River Park may as well be a world away. A toad born in Inwood Hill Park may be able to travel up to five miles per hour, but crossing from one end of Manhattan to the other in a sea of people, vehicles, and predatory birds would prove a great challenge.

Such is the case for those aforementioned rats, who may travel only a short distance throughout their lifetime. Their priorities are usually quite refined.

Earlier in that same chapter of his book, Dunn discusses the work of Wilson and MacArthur on the species-area relationship (he calls it one of the few genuine laws of ecology) and it ties neatly into the idea of these urban islands. Wilson worked in Melanesia studying ants, and he recognized a key difference about the ants from island to island. The short and sweet explanation: bigger islands have more species than smaller islands.

Here’s the longer explanation from Dunn as to the results of Wilson’s work on the theory of island biogeography. Bolded for the application to roster and league sizes that I’ll mention shortly.

"The theory had two key components. The first saw the probability of any particular species going extinct on an island as a function of the size of the island. MacArthur and Wilson thought that the chance that a species would go extinct from an island increased as the size of an island decreased. On smaller islands, populations of organisms were necessarily smaller, and so the chance that they would go extinct due to, say, one bad storm or one bad year was greater. What was more, the chance that a small island might not have enough of whatever the organisms needed was also greater. Time has lent support to the idea that there exists a general relationship between island area and extinction. The extinction rate of species on small islands tends to be higher than the extinction rate on larger islands, especially when those smaller islands have fewer kinds of habitats.

The theory’s second component addressed not the loss of species but, instead, their arrival. Species can colonize islands from elsewhere, whether by flying, floating, swimming, or catching a ride. Or they can evolve in situ. Wilson and MacArthur imagined that in both cases, the probability of such “arrivals” increases with the geographic area of islands. Species have a better chance of finding an island if it is bigger. Bigger islands are also more likely to have whatever special habitat, host, or other requirement a particular species needs. In addition, a bigger island might also provide more space for populations of a species to become sufficiently isolated from one another to evolve into different species."

This theory holds something deeply valuable about the stability and diversity of different species. We’ll look at islands from different angles later in this article, but another figure from Dunn’s book does an excellent job of illustrating the diversity that comes from larger areas. Here is the comparison of the number of ant species found in New York parks compared to broadway medians.

Relating to how Dunn was quoted in the beginning of this section, if you look hard enough you’ll see islands beyond just patches of rock in a distant sea. Perhaps instead of looking at Broadway medians or British woods surrounded by farmland, we can look at this species-area relationship through the lens of Mauboussin’s towns/league, the roster spots at a position, on a team as a whole, or perhaps throughout an entire league.

Suppose we were to rearrange Mauboussin’s league of 25 teams with 20 players each, assuming we are pulling from the same draw of 1000 potential players within one of his towns. If there are only 20 “6s” to pull from and 25 teams overall, we face a scenario where some teams are left out to dry while other teams may have more than one 6, which would be an enormous advantage. If we rearrange the teams to be 5 teams of 100 players or 100 teams of 5 players, we would once again see a tremendous shift in balance as the distribution of talent adjusts to the landscape of roster limits. In the 5 team scenario, almost every team is guaranteed to have at least one 6 while in the 100 team scenario it would create a huge disparity in talent for whichever lucky teams land one of those coveted 6s. Each scenario dramatically changes what we can expect as far as “different species” or “different types” of players depending on the roster structure.

Or perhaps instead of changing the structure of team size, we change the size of the league. Instead of only taking 500 players out of the potential 1000 available in the town, suppose we take only 20 players to populate our league. What does that push towards a smaller league look like in terms of the diversity of talent? In that scenario, we’ve gone to a smaller island and funneled all the 4s and 5s into extinction. If we were to expand the league, we’d see a shift towards greater diversity with the inclusion of 3s, 2s, and maybe even 1s.

Remember the earlier discussion of the rosters in the early 1900s compared to today? Roster sizes were miniscule in the early days of the NFL. In the 1920s, teams were only carrying 16 players on their “active” gameday roster compared to 46 today. The result was fewer niches, lack of specialization, and a general lack of diversity that led to a much different league than in the modern game. There are no more hybrids who play both sides of the ball. Kickers and punters are specialists, not dual-role players who fulfill both jobs at once or play other positions. For example, here’s Bill Belichick discussing the implications of roster size and positional evolution for specialists.

If we run the thought experiment of expanding the league’s rosters even further, what does that look like? As of now, the limitations of a 53-man roster lead to players who wear multiple hats. The defensive back who is good on special teams and good as a backup is often going to fill a spot rather than one who is truly great on special teams but deficient as a backup. The backup center must have the versatility to play multiple positions given the limitation of space on the condensed island of offensive linemen, otherwise he risks being replaced in favor of a player with higher fitness. The price of a great returner is often an adequate returner who would be much more suitable offensively in the event of a spree of injuries to the wide receiver room.

If we were to expand the roster further to 70 or 80 players, what types of further specialization happen as potential niches open up for pure special teamers without backup ability or pure backups who may not need special teams value? What does the structure of an offensive line with more room look like? How much more liberal will teams be in developing players without giving them playing time?

While we can’t be sure of what niches would open or close in the event of roster size changes, it’s safe to assume that any change will influence the population in the same way we’d influence an island’s biodiversity by scaling up or down.

53 Forms Most Beautiful

“There is grandeur in this view of life, with its several powers, having been originally breathed into a few forms or into one; and that, whilst this planet has gone cycling on according to the fixed law of gravity, from so simple a beginning endless forms most beautiful and most wonderful have been, and are being, evolved.”

- Charles Darwin, The Origin of Species

Just 159 miles to the northwest of Daphne Major sits the lonely Wolf Island, and about 24 miles to the northwest of that is the even lonelier Darwin Island. We’ve already met Geospiza fortis, the medium ground finch that evolved on short notice at Daphne Major as seed distribution changed. It doesn’t live up here, but one of the birds that came from the same lineage does. They are hard to tell apart at a glance, but Geospiza septentrionalis can be identified by what they can eat. On these remote islands, the isolated population of finches has evolved to deal with the adverse conditions in one of the most peculiar ways of any bird. It’s a vampire.

Geospiza fortis

Geospiza septentrionalis

While sucking blood doesn’t make up its entire niche, the vampire finch has come to rely on this technique in times of resource scarcity. It climbs onto the back of a nesting booby and pecks aggressively enough to draw blood but not enough to run the booby off, moreso an annoyance than a true threat to life. This parasitic behavior has been speculated to originate from mutually beneficial behavior, where the birds would peck off the parasites riding along on the booby. However, over time the behavior has evolved into something different, feeding on the bird itself.

The vampire finch, along with the full array of Geospiza birds, is an excellent example of adaptive radiation. They hail from a single common ancestor, but along the way through many generations they have changed and branched in new directions as a result of their environments. With no booby to eat, the vampire finch may have never turned to blood (and perhaps it would have died out). On a different island with more abundant resources, they may have never turned to the blood regardless of the booby population.

From Rob Dunn’s book, on the concept of adaptive radiation.

“New species, Darwin argued, could evolve on islands in response to their isolation and to local conditions. The islands of the Galapagos archipelago, for example, were formed by volcanoes that rose up from the sea floor five hundred miles off the west coast of South America. A single species of medium-sized tortoise arrived on the islands and evolved into no fewer than fourteen species of giant tortoises, some bigger, some smaller, some darker, some lighter. A single species of mockingbird flew to the archipelago and evolved into three species, each on its own island. A single mutable and drab species of finch flew to the islands and evolved into thirteen species, now called Darwin’s finches. The finches differed, as Darwin noted, in their beaks, which had become, as he would write in The Voyage of the Beagle, “modified for different ends” by natural selection. One of the species of Darwin’s finches evolved in such a way so as to use its beak to access nectar, pollen, and seeds from cactuses. Another became a vampire, pecking with its beak at the backs of birds and other vertebrates for blood. Another two species evolved the ability to use their beaks to hold sticks with which they hunt for grubs. Several species evolved beaks suited to a reliance on seeds.”

This idea of modification for different ends and the development into specific niches within the environment is particularly striking to view in the lens of positional evolution. From the origins of athletes who played both sides of the ball and were as diverse as offensive tackle-kicker hybrids (Lou Groza), true specialists have slowly expanded across the league in a way that couldn’t have been foreseen in the oldest days of the league.

Remember, in the same ways that these species would not exist if confined to an island too small to sustain them (and subsequently lacking niches for them to fill), so too are the uniquely honed players of today’s game reliant on their environment and the opening niches to provide them an opportunity. The twitchy undersized nickel corner’s niche opened in part due to the increase in passing game orientation and the presence of similarly sized and athletic slot receivers. Take away the environmental changes to open that niche (rule changes, evolution of play calling strategy, other positions evolving, etc) and the niche is erased.

Another aspect of diversity of types is that there is more than one way to adapt to fit into a specific niche or deal with a problem. In his book Improbable Destinies, Jonathan Losos discusses convergent evolution, when animals in different places and times evolve similar structures despite no common ancestor. But there are just as many cases where animals live in niches that are functionally the same without having converged to share similar body structures.

“A key reason for lack of convergence is that there may be more than one way to adapt to a problem posed by the environment. Think about the way vertebrate animals swim. Many use their tail for thrust, but not all tails are the same. Fish tails are vertically flattened and moved back and forth. Crocodiles swim in the same way. But whale tails are horizontally flattened and are moved up and down. Other animals, like eels and sea snakes, undulate their entire bodies. A few birds, such as cormorants and loons, can move speedily underwater by paddling ferociously with their web-footed hindlimbs. On other hand, some species swim using modified forelimbs, like the flippers of sea lions and the wings of penguins.”

Losos further discusses the definition of convergent and non-convergent evolution. Perhaps considering the vertical and horizontal tail locomotion of animals as non-convergent is a bit pedantic given that they function so similarly? Maybe looking more broadly would be more applicable for defining non-convergent evolution.

“Non-convergence can result for another reason. Often there are different functional ways to adapt to an environmental condition. As an example, consider how potential prey species may adapt to the presence of a predator such as lions. One option is to evolve great sprinting ability to outrun them, but there are other options, too, like camouflage, passive defense, or active defense. The resulting adaptations are decidedly non-convergent, encompassing the horns of the cape buffalo, the body armor of the pangolin and tortoise, the long legs of the impala, the spines of the porcupine, the venom and precision projection of the spitting cobra, and the dappled pelage of the bushbuck.

Multiple solutions to the same selective problem are not limited to defense. Cheetahs and African wild dogs hunt the same prey, but the cat does so by short bursts of great speed, whereas wild dogs run more slowly, but for long periods, exhausting their prey and eventually bringing them down. The adaptations of the two are correspondingly different: the extremely long legs and flexible spine of the cheetah allows it to attain speeds of seventy miles per hour; the great stamina of wild dogs allows them to maintain a steady pace of thirty miles per hour for long enough to fatigue their prey (cheetahs can only sustain their sprints for a short distance).”

Would it be a stretch to compare these ideas of multiple solutions to the same selective problem as player styles at a position? The 6’7” 350 pound offensive tackle with 36” arms may lack the maneuverability of a lighter and more precise man, but he fills the same niche as an offensive tackle thanks to the characteristics of length and strength. A tackle with sub-par measurements may be dinged by some teams for falling below standard thresholds, but the combination of movement skills, body control, and technical prowess allow him to accomplish the job with equal ability.

The rail thin safety who isn’t a hitter, like recent third round pick J.T. Woods, may fall off his tackle attempt, but if he arrives a tenth of a second earlier than his counterparts it may wash out in the final judgment of performance. The safety who runs a 4.68 may seem at first glance to be too deficient in speed to make it, but offsetting characteristics like processing and body control may allow him to be more of a wild dog than a cheetah.

Part of the beauty of speciation and adaptive radiation is that niches may not only be filled by one specific type, but by any one of a new variety of types that has the right blend of mutations to plug the metaphorical hole in the ecosystem. If the species-area relationship tells us about diversity relative to space, adaptive radiation and non-convergent evolution tells us to appreciate that diversity in every form within that space.

By Generation Not Time

“It turns out that Darwin and a century of biologists following him were wrong in one key respect: evolution does not always plod along at a snail's pace.”

- Jonathan B. Losos, Improbable Destinies: Fate, Chance, and the Future of Evolution

Day after day, Richard Lenski’s lab collects samples of E. Coli from 12 flasks and transports them to new flasks. These flasks are the equivalent of a simplistic pond, sporting a mixture of glucose, potassium phosphate, and citrate among other things. Glucose is the food source that drives this E. Coli, and these bacteria race to get it down in a form of bacterial Hungry Hungry Hippos. They divide six or seven times each day, quickly taking up all their available resources within each flask. This is why a small subset of the population of each flask needs to be transferred to a new flask daily.

Since 1988, when the experiment began with humble beginnings, there have been over 75,000 generations of E. Coli to pass through the experiment. While other experiments have been running longer (such as an Illinois corn breeding experiment that has spanned over a hundred years), this experiment is special in that the sheer number of generations evolving in the flasks outpaces anything you’d find with food or animal breeding.

Generation 33127 provided a shock in 2003. One of the flasks was discovered to be cloudy looking whereas all the previous flasks were clear. The initial expectation was that there had been contamination, but further testing demonstrated something remarkable about the bacteria. It had evolved to consume the citrate that E. Coli do not naturally consume and that none of the other flasks (effectively 11 alternative worlds) had managed to consume. The bacteria itself had evolved through mutations brought about over the many generations of its existence.

Another neat feature of this experiment is that bacteria can be frozen and then later revived no worse for wear. This means that Lenski’s lab has been able to save ancestral copies of the E. Coli, for example generation 1000 or 2000, and they can then thaw them out to compete against more evolved variants in a race to make the most of the flask’s resources. It’s not quite different from taking the 1930 New York Giants out of a time machine and running them onto the field to play the Giants in their present form. As you can expect, it’s a beatdown. The more evolved (or rather more fit) the bacteria is to its environment the more it tends to outcompete its ancestors.

One of the most insightful aspects of evolution at this level is that, while mutations may be more rare in something like E. Coli, the quick reproduction time turns out vastly more opportunities to diverge. It is not necessarily time that leads to evolution, but the number of generations passed through and how quickly an organism can evolve to their environment. Let’s put the evolution of a rapidly reproducing species under the microscope with an animal we discussed earlier.

With our London Underground mosquitos, the average developmental time to go from egg to mosquito is 7-10 days, which we’ll conservatively stick with the 10 days estimate. Assume we look at the number of generations possible for these mosquitoes dating back to the reproductive isolation event (the London Underground’s creation). Approximately 58,600 days have passed since this event, which assuming the reproduction rate of the mosquitoes is stable, would lead to 5,860 generations.

Speciation occurs on this scale of generations rather than purely human time frames, so while 160 years may seem like not that long ago, contrast it to the timescale of human generations. Assuming that new generations occur approximately every 25 years for humans, you’re looking at 146,500 years to rival the mosquito timescale. For comparison, some of the oldest rock art ever found is approximately half as old.

Another useful example of rapid evolution is the growth of E. Coli on the MEGA plate at the Kishony Lab, where researchers used exponentially increasing amounts of antibiotics to slow down the bacteria’s evolution, but after about 10 days the plate was overtaken entirely thanks to the quick rate of reproduction and mutation.

Applied to scouting, it’s worth thinking about the change in player types through generations and “mutations” (or novel changes) that are adapting to the environment. It’s not necessarily the span of years driving these alterations, but rather a change in selection pressures and preferences over a certain number of iterations.

The shift towards accepting smaller quarterbacks is one of the more notable trends to appear in the past couple years and it follows this same path of beneficial mutations. Russell Wilson fell all the way to the third round at 5’10 ⅝” (5105) in 2012, but it would seem his performance has helped open the door for a number of shorter quarterbacks.

Not only are teams more forgiving at the top of the draft with players like Bryce Young (5101), Kyler Murray (5101), or Tua Tagovailoa (6000), but similarly short quarterbacks are now filling the middle of the draft. Jake Haener (5115), Stetson Bennett (5113), and Jaren Hall (6001) are just a handful of quarterbacks in last year’s class who likely benefited from the success of Wilson.

It’s a copycat league and when one mutation or change in selection leads to an improvement in fitness (as defined by a player’s success), it’s likely to lead the population as a whole down a different path. If we were to place a copy of the league into those 12 flasks at Lenski’s lab, how many times do we see short quarterbacks prosper? Perhaps we saw them prosper in our world around generation 100, but we would have seen them do it at generation 50 or generation 150 if we were to run it again elsewhere.

Another point of note here is that evolution is not a long jump from one state to the next, just like the league doesn’t evolve to accept short quarterbacks in the span of a year or two. As generations pass and samples grow larger it becomes easier for those who are fit to outperform those who are not. These processes take time and everything within the system interacts in a way that makes identifying the developing and future trends and fitness hard to do beyond broader strokes of the brush.

Replaying The Tape of Time

“When we realize that the actual outcome did not have to be, that any alteration in any step along the way would have unleashed a cascade down a different channel, we grasp the causal power of individual events. We can argue, lament, or exult over each detail–because each holds the power of transformation. Contingency is the affirmation of control by immediate events over destiny, the kingdom lost for want of a horseshoe nail. The Civil War is an especially poignant tragedy because a replay of the tape might have saved a half million lives for a thousand different reasons–and we would not find a statue of a soldier, with names of the dead engraved on the pedestal below, on every village green and before every county courthouse in old America.”

– Stephen Jay Gould, Wonderful Life

“But if life started with all its models present, and constructed a later history from just a few survivors, then we face a disturbing possibility. Suppose that only a few will prevail, but all have an equal chance. The history of any surviving set is sensible, but each leads to a world thoroughly different from any other.”

– Stephen Jay Gould, Wonderful Life

It’s November 8th, 1923 and we’re in Munich, Germany. We’re in the Bürgerbräukeller, a large beer hall where Gustav Ritter von Kahr, the Bavarian state commissioner general, is giving a speech. A volatile group of self-proclaimed nationalist socialists (yeah, those ones) bursts into the hall and their leader fires a pistol into the ceiling to draw the attention of those in the hall. He declares a national revolution to the crowd and shuffles Kahr and his colleagues into a back room of the hall, demanding that they join the uprising against the Weimar Republic. The building is occupied by hundreds of Nazis. Elsewhere in the city, the flames of revolution are being somewhat extinguished by the local government, but the outcome of the clash is undetermined at this point.

By morning, sensing that the Beer Hall Putsch is losing steam, the supporters of the coup d’etat decide to march. The group of about 2,500 people marches towards the city center and the Bavarian defense ministry. At the Odeonsplatz, a square in the center of Munich, the group encounters a force of 130 state police officers. Shots ring out as the soldiers exchange gunfire with the Nazis.

Max Erwin von Scheubner-Richter, one of Adolf Hitler’s closest collaborators, walks arm-in-arm with him during their march. He is struck by a bullet to his chest, falling and dying immediately. Hitler falls with him, dislocating his right shoulder and believing himself to be shot. Ulrich Graf, Hitler’s bodyguard, is shot in the shoulder once before shielding Hitler from the gunfire. He is hit by five more bullets but survives, in part due to his size, but also due to a great deal of luck. The other top Nazi collaborator and famed World War I general, Erich Ludendorff, walks unharmed through the gunfire into the line of police who refuse to fire on him. He is arrested. Hitler makes a getaway to a nearby safehouse and contemplates suicide, but is arrested a couple days later.

This is just one of many instances where the branches of possible futures stretched in many directions, yet only one path was taken. The particular bullet that ended the life of Scheubner-Richter could have been two feet in the other direction and into the chest of the eventual dictator. If Graf’s fortitude as a bodyguard was slightly less impressive or the first shot had hit his head then five shots could have traveled uninterrupted into the body of Hitler.

Hitler avoids a lengthy jail sentence despite his treason and attempted coup, and by 1933 he takes power. The rest, as they say, is history. But there are plenty of instances like this along the path of history where a radically different outcome could have occurred.

He could have been accepted to the Academy of Fine Arts Vienna in 1907. He could have been killed many times over in World War I during his time as a runner. A shell exploding in the runner’s dugout in October 1916 left him hospitalized for two months, but it just as easily could have taken limb or life. A mustard gas attack in 1918 that left him temporarily blinded could have permanently blinded him or killed him. Of the roughly 15,000 people who were in his regiment, 2,700 were dead or missing by the end of the war.

Perhaps instead of brief incarceration, he is executed or imprisoned for decades for his treason if a heavy-handed judge ruled on the case. Perhaps we fast forward in time several years and find him killed by one of his many attempted assassins. Two suitcase bombs planted at Nazi party headquarters in 1936 missed their target. Swiss student Maurice Bavaud missed his chance to shoot Hitler during a parade in 1938. 16 years after the failed coup at the Bürgerbräukeller, a carpenter named Georg Elser planted a bomb that was supposed to kill Hitler in the same building. He missed by 13 minutes, as Hitler left the beer hall just before the explosion killed eight people and injured 62.

It may seem odd to talk about the failed Nazi coup of 1923 or the many would-be killers of a dictator in an article about evolutionary processes in football, but among the 20th century events where tendrils stretch far and wide to every corner of the world, World War II looms as one of the largest. In the same way that football’s history would have been reshaped had Pudge Heffelfinger’s father been fatally wounded at the Battle of Gettysburg, the early death of Hitler prior to World War II could have substantially changed the path of football’s substitution rules and the subsequent roster changes that followed them. And surely some great players who would have changed the course of the league never made it home from islands in the Pacific or rolling hills in Europe. We don’t know these players, unfortunately, just like we wouldn’t know Heffelfinger if his father were struck down on a hot July day in 1863.

Here is an excerpt from Timothy P. Brown of Football Archaeology on the substitution rules that came as a result of the second world war and the cascade effect onto the game as a whole.

“Football might not have flipped its substitution rules if not for WWII. America began mobilizing for war in 1940, and with draftees and volunteers leaving campuses, concerns arose about the depth of college football rosters. To allow coaches to substitute for an injured or tired starter while also allowing them to return to the game, the 1941 rules committee approved unlimited substitutions, meaning players could enter and reenter the game whenever the ball was dead. Intended as a temporary rule, the rules committee and everyone else expected substitutes to enter one or a few at a time as short-term relief, and that is how coaches applied the rule until Fritz Crisler gambled on a new approach against a superior opponent.

Crisler was Michigan's coach in 1941 when he presided over the rules meeting that approved unlimited substitutions. Still, even he did not appreciate the door opened by the new rule. In 1945, however, his squad was filled with freshmen not yet of draft age and others designated as 4-Fs, or physically unfit for the armed forces. In the week leading up to their game with West Point and its future Heisman Trophy winners -Doc Blanchard and Glen Davis- Crisler created separate offensive and defensive units, swapping them with each change of possession. (A few top Wolverines played both ways.) Although outgunned, Michigan lost 28-7, Crisler's strategy caught other coaches' attention, and some copied his approach.”

He continues on the topic of Army’s later usage of the two-platoon system and the substitution rules in college compared to the NFL’s eventual unlimited substitutions.

“Nevertheless, many coaches and fans derided two-platoon football because it ran counter to the long-held ideal of the all-around athlete and sixty-minute man. Others argued that the waves of players entering and exiting the field confused fans. Some fans agreed, including those who booed Army when they platooned against Stanford at Yankee Stadium in 1948.

Meanwhile, the NFL had liberalized its substitution rules during WWII by allowing three subs whenever the ball was dead or by mass substitutions between quarters. The NFL moved to unlimited substitutions in 1950 and has retained the rule ever since. The NFL's increased popularity in the 1950s provided the cash to expand rosters and protect their star quarterbacks from potential injury playing defense. Separate offensive and defensive units became the norm, leading to specialist offensive and defensive coaches who focused on one side of the ball, creating and teaching more specialized techniques and concepts to players who could absorb those details. It is not a coincidence that the term "special teams" emerged in the mid-1950s as NFL teams increasingly mixed starters and substitutes on specialist punting and kicking units.”

Counterfactuals like the prevention of World War II and the subsequent butterfly effects that branch off them are so numerous that you can find them everywhere through popular culture and punditry. The “what if” crowd is never lacking in ideas for how the history bus could have been sent careening down a different curve in the road.

Suppose I gave you a remote that allowed you to replay a single play, but that the play would be rerun anew in real time at the exact same state in time. On 3rd and 5 with 1:15 left in the 4th quarter in Super Bowl XLII, how many times would Eli Manning break multiple sacks and have his prayer answered by David Tyree’s helmet catch? On 4th and 20 with 0:58 left in the 4th quarter, how many times does Davis Mills narrowly put a pass over the fingertips of Colts safety Rodney Thomas and start a comeback that keeps Bryce Young out of town?

Any small change creates incomprehensibly complex trees of possible outcomes, both in human affairs and in the natural world. In the terms of Gould, it is known as contingency. The idea that whatever comes next is dependent (or rather, contingent) on what came before it.

"Historical explanations take the form of narrative: E, the phenomenon to be explained, arose because D came before, preceded by C, B, and A. If any of these earlier stages had not occurred, or had transpired in a different way, then E would not exist (or would be present in a substantially altered form, E', requiring a different explanation). Thus, E makes sense and can be explained rigorously as the outcome of A through D. But no law of nature enjoined E; any variant E' arising from an altered set of antecedents, would have been equally explicable, though massively different in form and effect.

I am not speaking of randomness (for E had to arise, as a consequence of A through D), but of the central principle of all history-contingency. A historical explanation does not rest on direct deductions from laws of nature, but on an unpredictable sequence of antecedent states, where any major change in any step of the sequence would have altered the final result. This final result is therefore dependent, or contingent, upon everything that came before-the unerasable and determining signature of history.

Many scientists and interested laypeople, caught by the stereotype of the "scientific method," find such contingent explanations less interesting or less "scientific," even when their appropriateness and essential correctness must be acknowledged. The South lost the Civil War with a kind of relentless inevitability once hundreds of particular events happened as they did-Pickett's charge failed, Lincoln won the election of 1864, etc., etc., etc. But wind the tape of American history back to the Louisiana Purchase, the Dred Scott decision, or even only to Fort Sumter, let it run again with just a few small and judicious changes (plus their cascade of consequences), and a different outcome, including the opposite resolution, might have occurred with equal relentlessness past a certain point."

– Stephen Jay Gould, Wonderful Life

We can pose questions about these counterfactuals to try to unravel the way the league has come to be what it is and why. Perhaps there may not be a causal link between Pudge Heffelfinger’s professional status and the development of the league as a whole, but it’s hard to argue that the path of World War II, and subsequently the substitution rules that emerged from it, wouldn’t have been altered if an artillery shell in 1916 or a bullet in 1923 had slightly diverged and ended the life of Hitler.

If the powers that be in the rules committee decided against modernizing the passing rules, then we would have been sent tumbling down another line of history where the likes of Otto Graham and Johnny Unitas may not be hailed as crucial pioneers of the passing game. If Johnny Unitas, who was out of football after being released by the Steelers as a rookie, had been injured or failed to make the Baltimore Colts after a tryout? Then we would again take another unique path away from what transpired in our own history.

What would have happened if the ball stayed in a less aerodynamic form, resembling more of a rugby ball than the slick and pointed Wilson ball? Would we have seen rule shifts to prioritize running even more throughout the 1930s and 1940s? Would we have ever seen the ball shape change again throughout league history?

We could go through these counterfactuals with the type of exploratory and imaginative detail of a school kid scribbling in his notebook rather than listening to the teacher.

However, the point of this idea of replaying the tape is not to paint illustrious pictures of alternative leagues, as interesting as they may be. It’s to understand that everything within the league is dependent upon those antecedents that Gould alludes to. While niches may open and close as a function of the size of the league and the result of the environment changing, they are not predestined states that are always going to form. This is one of the main issues in trying to predict changes going forward or trying to “beat the line” in finding an adaptive solution that will create an edge in the present.

If we were to rewind the tape to the 1920s and create 100 copies of the league, how many times would we converge towards our present state? In some of these copies, the league may go bankrupt or be beaten out by competitors. In some copies, the ideal of the sixty-minute man and two-way player never goes out of style. Even in cases where we do see a form of convergent evolution towards the niche-specific pass-heavy game of the modern day, we could have been pulled in different directions as the players that populate our league vary in ways that would be unpredictable to us.

In one league, we could see an early trend towards uniquely athletic quarterbacks that win more with their legs rather than wheeling and dealing from within the pocket. All it would take is an early initial advantage with the success of a few 4.3 or 4.4 speed quarterbacks getting into the population and drawing the copycat interest of others looking to replicate that model of success. If you plug Lamar Jackson and Michael Vick into the 1940s or 1950s (setting aside the racial biases of the time), would it be unfair to think that the success they could have would have changed the evolution of the quarterback position?

Suppose we were to think of these leagues as their own independent islands, similar to how we viewed rosters as islands in the light of Rob Dunn’s discussion of island biogeography. We may leave the islands to their own devices and return after 100 years to find that once similar islands evolved in unique ways that would have been unpredictable in advance. Convergence may occur on some occasions, but those small differences would often blossom into the types of changes that can reshape the entire system.

To quote Losos again, now on the divergence that comes from evolution on different islands.

"Islands provide a grand cookbook of evolution. And the resulting concoctions inform us that there’s no telling what will come out of the oven. Change the ingredients or the order in which they’re added, turn up the heat, leave something out, use one pinch of salt instead of two, and the result may taste very different. Even when using the same recipe, seemingly innocuous events, like substituting one brand of flour for another or using your neighbor’s kitchen instead of your own, may make a big difference. The island cookbook is replete with tales of contingency and chance, the diversity of outcomes suggesting that predicting what will evolve on any given island is very difficult."

Perhaps we could view the evolution of any given league, and the one we live with today, in the form of an ever-expanding game of Plinko. We drop a puck, then as it ricochets off the various obstacles on the Plinko board it moves in ways that are both unpredictable and irreversible, until it inevitably lands in a final spot.

The real outcomes are more diverse than the somewhat limited Plinko board, but the same concept of dropping the puck and letting it move about isn’t far off from this hypothetical rewinding of the tape. One divergent event sends us down a new path that can never be undone.

I created a visual example that goes like this. Suppose we start in the same place and face five events, each with a 50/50 probability of occurring. We go through each event and physically map the path it takes, left or right. After these five events we have a map of our path through all possible outcomes, and each final landing spot has a 1/32 (3.125%) chance of occurring. Each set of black bars is the depiction of possible outcomes at the time of that event.

I ran through this scenario twice and mapped the path of the outcome. Suppose the first event is whether or not World War II occurs as it did and our league’s history banks initially on the introduction of substitution rules. Perhaps the second would be the change in passing rules to allow it more frequently and closer to the line of scrimmage. You get the picture.

Another way to think about the sensitivity of the entire system to path dependent occurrences like the substitute rules or the pass game rule changes would be to ask what would happen if you took the NFL in its current form and then subjected it to a complete redraw of the rules prior to the start of the next year.

If we enter the 2023 season and Roger Goodell draws the new rules out of a hat, how poorly adapted would each team be? How long would it take for teams to alter their roster and reconstruct to deal with the new environment they exist within? This is an extreme example, of course, but suppose we simply outlawed running the ball. How many players who are currently on the streets or in a smaller league would be called up to replace those who are optimal within the current system but suboptimal in a league without running? How many players that are pro bowl caliber players would there be going from high in demand to finding another field of employment?

Along with the importance of path dependence and contingency, we’d see how adaptable those within the system are to the new environment. Some player types would easily transition from their old role to a new one with minimal dropoff in performance or even greater success. Some teams would be quick to react and adjust to the environment, while others would cling to ineffective beliefs and personnel. We’d see them dragged along in losing fashion until a new regime with a better approach comes in to fill their seats.

Perhaps the most critical part in understanding this in application for scouting and team building purposes is in recognizing that the past was not already determined to travel the path it did, and that the present is not marching along with a form of destiny guiding it. As such, adjusting properly to create the greatest chance of success or understanding of the system requires viewing the current state as one among many possible alternatives. It requires a uniquely tuned and “state dependent” approach, not an absolutist approach determined on preconceptions that may be out of date or out of touch with the current environment.

Our ideals and standards of today could be nullified in a world where a bullet strayed a couple feet to the side or a few different early trailblazers changed the prototype of a position. We would be wise to understand the delicate underlying frameworks with which we have built those ideals and standards on.

Potters and Pigeon Fanciers

“The word which came to Jeremiah from the Lord, saying, Arise, and go down to the potter's house, and there I will cause thee to hear my words. Then I went down to the potter's house, and, behold, he wrought a work on the wheels. And the vessel that he made of clay was marred in the hand of the potter: so he made it again another vessel, as seemed good to the potter to make it. Then the word of the Lord came to me, saying, O house of Israel, cannot I do with you as this potter? saith the Lord. Behold, as the clay is in the potter's hand, so are ye in mine hand, O house of Israel.”

- Jeremiah 18:1-6, The King James Version

“How foolish I was not to take fanciers more seriously. Oblivious was I to the fact that many of these men (and women too), in their own way, know at least as much about birds as any museum ornithologist or field birder. In their highly skilled hands pigeons are but putty that can, within a few generations, be molded into any shape and remade in virtually any color. Fanciers can fast-forward evolution like an H.G. Wells time machine.”

- Katrina Van Grouw, Unnatural Selection

“Some of our greatest historical and artistic treasures we place with curators in museums; others we take for walks.”

- Roger Caras

New York banker Roswell Eldridge had a lot of money. He founded the Great Neck Bank in Long Island in 1906 and quickly accumulated a vast amount of wealth, spending it on things like a mansion and a yacht. Eldridge was also quite a dog lover. In 1925, he attended the Crufts Dog Show in England, but he was disappointed with the King Charles Spaniels that were shown.

In the 1700s and 1800s, the breed looked much different than the form Eldridge encountered in the early 1900s. Among Eldridge’s gripes about the features of the modern breed were the flattened face with a squished nose and a dome shaped head. It did not look like a true King Charles Spaniel to him. The traits that he desired had essentially been bred out of the population over a century.

Painting of a King Charles Spaniel by Henry Bernard Chalon, 1800

King Charles Spaniels in 1915

Eldridge placed an advertisement in the Crufts schedule offering a £25 reward (equivalent to about $2,420 today) for the best of the “old type” spaniel as depicted in the paintings of the past. Within a relatively short period of time, a new breed club designed to emulate the standards of old emerged as many breeders pursued the reward. A breeder of short-faced spaniels named Mostyn Walker bred a dog named Ann’s Son, who became the standard of the new breed at the time. He matched the ideals of the old type nearly perfectly, although there was some dispute over the varying paintings of the past that included some spaniels with a slightly shorter muzzle.

Ann’s Son

The breed was finally recognized in 1945 after acceptance by the Kennel Club of Great Britain, and it held the name of the Cavalier King Charles Spaniel, which came from a painting called The Cavalier’s Pets (1845). The breed standards of present day are more resemblant of the spaniels in the painting than Ann’s Son, but most of the key characteristics of the breed have been maintained over the past century. Skeletal remains show a very close resemblance between the Cavalier King Charles of today with the old type King Charles of the 1800s, prior to the flattening of the face. In effect, the breeders managed to turn back the clock to remake the old type within less than a decade. As of 2022, the Cavalier King Charles was ranked as the 14th most popular breed by the American Kennel Club. The original King Charles Spaniel is referred to as an English Toy Spaniel, and it ranked as the 134th most popular breed.

A King Charles spaniel (left) and a Cavalier King Charles spaniel (right)

Illustration by Katrina Van Grouw, Unnatural Selection

Reading through the 900 word breed standard description of the Cavalier King Charles, you’d think breeders were talking about the specificity of an architectural design mixed with the lavish descriptiveness of a television advertisement.

Height 12 to 13 inches at the withers; weight proportionate to height, between 13 and 18 pounds.

Bone moderate in proportion to size. Weedy and coarse specimens are to be equally penalized.

Length from base of stop to tip of nose about 1 ½ inches.

Nose pigment uniformly black without flesh marks and nostrils well developed. Lips well developed but not pendulous giving a clean finish.

The sweet, gentle, melting expression is an important breed characteristic. Eyes large, round, but not prominent and set well apart; color a warm, very dark brown, giving a lustrous, limpid look. Rims dark. There should be cushioning under the eyes which contributes to the soft expression.

The most peculiar, or perhaps the most morbid, part of artificial selection is the plasticity of animals to change. Within relatively few generations a breeder can reshape the physical design of animals in a way that would make nature blush. They are expert potters, quickly reshaping their stock like they are reshaping clay with a guiding hand.

The droopy-faced 200 pound English Mastiff and the bug-eyed 5 pound Chihuahua are as different as dogs get physically, but a best guess of divergence (from a common ancestor) would only be somewhere in the range of tens of thousands of years. The mastiff’s lineage came down the path of a European working dog while the Chihuahua originated from the Techichi, a small and mute dog owned by the Toltecs (and then the Aztecs) that were treated as both food and spiritual guides into the afterlife. These Techichis are believed to have descended from dogs who crossed the Bering Land Bridge about 15,000 years ago. Almost every fancy pigeon breed originates from the rock pigeon, but they take shapes, sizes, and colors of such a diverse range that you would be forgiven for believing them to all hail from different ancestors.

Artificial selection, while capable of being a fast forward button on selection processes, is also subject to a large degree of human error. The whims of human preference and the limitations of understanding the underlying structures that compose an animal create problems that define what can go wrong whenever selection is human-guided.

The Cavalier King Charles Spaniel, for instance, suffers from a variety of problems. World War II led to a decline in population for many kennels, which limited the genetic diversity of the breed and has contributed to some of the health problems the breed faces. Over 40% of the dogs die from heart problems. They have skull and spine deformities. About 30% of the breed has eye problems. They have hip and knee problems. In fact, Norway banned their breeding outright because no dog from the breed can be truly considered “healthy”.

There’s no shortage of dog breeders who think they know what a dog should look like, and they have no qualms with putting their hands around the metaphorical clay to reshape it to fit their vision. But where natural selection would often gradually filter out dogs with hearing, vision, or other physical issues that would limit reproductive success, amateur breeders may ignore those concerns in favor of a certain look or type that is in vogue or drawing a large sum of money.

Take the bull terrier as another example of changes where the underlying implications may be missed by breeders looking for a certain aesthetic. Here is an excerpt from Katrina Van Grouw’s Unnatural Selection on the change in dogs and the bull terrier skull over the last century. Bolded for emphasis on the application for our purposes.

“I was suddenly reminded of a black-and-white photograph I’d seen reproduced in a magazine article about the history of Crufts Dog Show. It had looked like an everyday family dog, like a Golden retriever. But the caption said it was a Pyrenean mountain dog. I was puzzled—I’d always prided myself on my dog breed identification. It couldn’t be explained away as a bad representative of the breed because it had won a prize at Crufts. Then I’d noticed the date on the caption—this was how Pyrenean mountain dogs had looked fifty years earlier.

I was surprised to reflect that even the most idiosyncratic breeds must have looked very ordinary a hundred or so years ago. For example, only a slight shortening of the skull, giving an undershot lower jaw, distinguished the bulldog of the early nineteenth century, and even more remarkable still was the transformation of the Bull terrier. Bull terriers were historically rat-catching dogs— the “bull” in their name comes from the fact that they were created by crossbreeding bulldogs and terriers: bull + terrier. They’d had a straight skull originally, even with a fractionally upturned muzzle. Over time the muzzle has rotated downward through more than 45 degrees, giving them their unique charismatic profile. It’s still possible to find dogs very similar to the original, straight-nosed Bull terrier in Pakistan, the descendants of dogs that were imported by the British army during the Raj.

Of course, no one breeds animals for the appearance of their skull or skeleton. Many breeders have no idea what the underlying skeleton of their charges even looks like. They select animals for their usefulness or commercial properties or, in the case of dogs and other exhibition animals, according to recognized breed standards describing how the perfect example of that breed ought to look, rather like a “virtual type specimen.” Breed standards are revised every so often, keeping the elusive goal always just a little out of reach, unconsciously ensuring that the breed’s appearance is in constant flux, even though its name might stay the same.

The transformation might not always be permanent or in one direction. Growing awareness of animal welfare issues might call for a reversal of some features, or fashions might simply change. Dogs are an obvious example, but change is an inevitable result of selective breeding of all domesticated animals, and plants too, just as changes result from similar evolutionary forces in nature.”

Bull Terrier skulls, 1900 (top), 1960 (middle), and present day (bottom)

Illustration by Katrina Van Grouw, Unnatural Selection

A bull terrier from the early 1900s (left) compared to a modern bull terrier (right)

Another example of this gradual change in selective breeding is the growth in size of the budgerigar (or “budgie”). According to Van Grouw, the trend towards larger exhibition birds has led to growth in the size of budgerigars over the last century. The wild budgerigar was comfortably adapted to an environment where it crossed large swaths of dry grassland with limited water supplies, and perhaps the physical demands on a larger bird kept it from naturally growing more. In breeding, the budgie has both structurally increased in size and been bred for larger feathers, which creates the illusion of greater size.

Museum skin specimens of budgerigars increasing in size: 1908 (left - wild budgerigar), 1918 (middle left), 1983 (middle right), and 2012 (right)

Illustration by Katrina Van Grouw, Unnatural Selection

The definition of a certain type is tricky. It’s reminiscent of the paradox of the Ship of Theseus. Suppose we were to take a ship, then gradually over time as pieces of the ship decay we remove them and replace them with new pieces. At what point does the ship stop being the original ship? Does it ever, even if we eventually reach a point where nothing about the ship is the same? It would be hard to argue that the budgerigars or bull terriers of this century are the same as last century’s cast, but they carry on that name.

Whenever types change, there is a pushback from “traditionalists” who believe the change just doesn’t fit what is “right” for the type. To quote Van Grouw again, this time on the English bulldog.

“An excellent example is the English bulldog… The expressions ‘British bulldog,’ ‘Darwin’s bulldog,’ and so forth all conjures up images of the pedigree English bulldogs we’ve become accustomed to seeing in dog shows–a broad head, undershot jaw, and a short, upturned muzzle. In fact these phrases were coined at a time when bulldogs looked very different indeed. Anyone today seeing a bulldog of the mid-nineteenth century, with its relatively long, straight muzzle, would deny that it’s a bulldog at all. So when breeders have attempted to re-create this type of animal (not for the baiting of bulls but as a healthier, more active dog; fit, in theory if not in practice, for its original purpose), they’ve been opposed, ironically, by hard-core enthusiasts who claim that their creations ‘are just not bulldogs’!”

Does this type of traditionalist opposition sound familiar?

“That’s not a quarterback! He runs too much! He’s too small! He’s too quiet, not an outspoken leader of men!”

Preconceived standards like these exist throughout football selection processes.

Standards of intelligence led Steelers head coach Walt Kiesling to release Johnny Unitas because he viewed him as “not smart enough to play quarterback in the NFL”.

Standards of passing acumen and style played a role in a player like Lamar Jackson to slide to the back of the first round.

Standards of height played a role in Russell Wilson landing in the third round, and as those standards have been revised it has allowed shorter quarterbacks to make the earlier mentioned climb into the top of the draft.

There is no point I can emphasize greater in regards to artificial selection (and by extension the selection of players) than this. Optimization by human judgment has limitations.

All forms of factors can influence the ability to optimize within the system, but let’s examine a few. Suppose we were to take two types of players. We assign a “fitness grade” to each one. This fitness grade functions as a probability of a given outcome, which we will treat as a binary outcome of success or failure within the environment. Here is an example.

Type A has a 0.7 fitness grade, which means that it has a 70% chance of success.

Type B has a 0.6 fitness grade, which means that it has a 60% chance of success.

We can window dress these to make the story more interesting. Suppose that type A is bigger, more athletic, and came from a better level of competition. The 30% chance of failure is a result of the inherent underlying risks of injury or something of the unexpected sort derailing performance. Type B is more technically proficient, but lacking the same size and athleticism. The 40% chance of failure for type B is a combination of both the underlying unexpected factors and the suboptimal traits compared to type A.

Because we are designing this and know the underlying odds, we can easily project our knowledge onto this system and pick type A every time. If I were to ask you to bet on this repeatedly while knowing the underlying probabilities, you would be wise to stick to A no matter what. But when we take away your knowledge of those probabilities and toss you blindly into the system, you will need a large sample and an observant approach to figure out whether to pick type A or type B.

Let’s suppose you were a general manager tasked with making the best picks possible between these two types. You have a limited number of chances (five) to pick from. Let’s assume that in one option, you would pick all type A, then the other option you’d pick all type B. For type A, 1-7 count as a success and 8-10 count as a failure. For type B, 1-6 count as a success and 7-10 count as a failure.

In this small sample of picks, here are two possible outcomes among the many you could get in running this experiment.

Run 1

Type A

Success (6), Failure (9), Success (6), Success (1), Success (6) - 4/5 (80%)

Type B

Failure (7), Success (4), Failure (10), Failure (10), Success (1) - 2/5 (40%)

Run 2

Type A

Failure (9), Success (1), Success (4), Success (2), Failure (10) - 3/5 (60%)

Type B

Success (4), Success (5), Failure (10), Failure (7), Success (4) - 3/5 (60%)

There are all sorts of ways you could misinterpret this and botch your strategy compared to the “all knowing” strategy of always picking Type A no matter what.

In the first run, you may leave thinking that the underlying odds are much different (80% vs. 40% success) than they truly are (70% vs. 60% success). In the second run, you may leave thinking that the odds of success are exactly the same, as they both finished with 60% success. Suppose you could change your picks between type A and type B during the process. If you had two type A failures back to back (9% chance), you would likely adapt and select type B. Perhaps you observe the first selection from type A fail while the first selection from type B succeeds (18% chance) and you form early beliefs about the two that are hard to let go of. Maybe you see two back to back type A failures and believe type A has “lost its luster” or something of the sort despite no change in the underlying probabilities.

Any given participant in this experiment, without knowledge of those true underlying probabilities, could come to vastly different beliefs about what works and doesn’t work. Their selection would be imperfect for a variety of reasons like the ones mentioned above. Some would be fired for bad luck rather than poor selections. Some would be praised for good luck rather than good selections.

When looking through their small windows, the people fired for bad luck could assume that it was genuinely poor selections that led to their termination. The people praised for good luck would truly believe they had developed an optimal strategy for picking. Both would be incorrect, but they would have a hard time knowing otherwise. The same goes for those who change decision rules when they see patterns in the noise or try to adapt to trends that are more fluctuations of randomness than underlying changes in the odds.

In natural selection, it is the propensity of a phenotype’s survival and reproduction that matters, not a result of any one individual. However, artificial selection allows those selecting to overweight the impact of single individuals, whether through personal experience or through observing the experiences of others.

There is also a matter of domain difficulty or likelihood of success for the “optimal” type. Even those with great traits that should thrive in the environment are still subject to forces that can and will lower their odds.

Suppose we were to drop the “world’s greatest genius” into the 14th century at conception, hoping that they would meet their potential. The odds of being born a woman immediately halve their chances of getting to use that genius given the sexism embedded in society. Their odds of dying in the first year of childhood are approximately 25%. Suppose we take a conservative estimate and assume that 10% of the population is nobility, of which our genius would need to be born into if they were to reach their potential. Suppose we also assume that they land somewhere around the time of the Black Death, which killed 30% to 50% of the population. They may have a greater chance than anyone else in society of being a transformative genius, but when you tally up the odds you’re looking at approximately a 2% chance of getting through just those four obstacles of gender, infant mortality, nobility, and disease if you were to treat all of them as independent of each other. Our expectations would be that we’d have to drop about 50 copies of this genius just to get one through.

The phenotype itself is no less impressive, as it would be the peak of opportunity for genius to emerge. Drop the world’s dumbest idiot into the 14th century and their odds are certainly much lower to become a great genius of the time, but in both scenarios the most likely outcome is a non-event hidden by the properties of the system and unapparent to us. They exist in the same realm as all the potential football greats who never came home from World War II, but they are no less a part of the system even if we do not observe them.

Here is another example. Approximately 1.25 racehorses died for every 1,000 starts in 2022 according to The Jockey Club. If we were to rewind the tape to the 1973 Belmont Stakes and Secretariat breaks a leg just before winning the race by 31 lengths, would he lose his value as likely the greatest racehorse of all time? He would likely have to be euthanized and he would sire no horses, but would his phenotype have been any less impressive? Would he have been any less equipped to dominate the Belmont if you were to replay the tape over and over again?

Even if there were a 30% chance of any horse dying on the track, Secretariat included, you would be foolish to select any other horse as your preferred type. Just as our type A is the ideal pick in a probabilistic world, so is Secretariat. Just because we can only observe a single outcome or very few outcomes does not mean we should stray from this idea of pursuit of an ideal type if we have good reason to believe that is the best thing possible in the system.

The failings of artificial selection are everywhere around us. Take for instance, the failed selection process of camouflage by the U.S. Army. In Emlen’s book Animal Weapons, he discusses experimentation with mice blending in at night compared to the process of the military figuring out what camouflage would be optimal for their soldiers. Natural selection easily filters mice at night, as owls find mice that are either too dark or too light and eat them, removing them from the population. But the selection process of picking soldier camouflage is subject to other constraints and can create suboptimal solutions as a result.

“Obviously, blending with backgrounds is essential for soldier survival for precisely the same reasons that it is in mice (imagine conducting a night operation wearing white winter camo). In fact, in 2003 the U.S. Army used a process not unlike Kaufman’s experiment with owls to determine the most effective camouflage patterns for our troops. More than a dozen color and pattern types were assessed against urban, desert, and woodland environments, to identify uniforms least likely to stand out. Some of these tests were conducted at night, where they showed–just like Kaufmann–that being too dark on moonlit nights could be deadly. Modern enemy soldiers, it turns out, are a lot like owls. They have phenomenal nighttime vision thanks to the spread of night-vision goggles and other technologies. As a result, black has been eliminated from most camouflage patterns.

Ideally, the uniform selection process should have unfolded just like owls selecting for mismatched mice, with the population–in this case, the army–evolving toward the best camouflage possible. Unfortunately, politics and the economics of mass production intervened. Rather than choose several different types of uniforms, each the best available for a particular habitat, the army opted for a single Universal Camouflage Pattern (UCP).

This may have solved logistical problems of production and distribution, but it also caused our troops to sometimes stand out when they were supposed to be blending in. After all, the solution with mice was two colors, not one, and the reality of diverse combat habitats is that no one pattern blends well in all places.

It didn't take long for our troops to complain, And by 2009 it was obvious to everyone that the UCP was performing terribly in Afghanistan. The army then rushed to develop a new pattern, called "Operation Enduring Freedom Camouflage Pattern" (OCP) for soldiers deployed in Afghanistan, which it began issuing in 2010. Special Forces soldiers, incidentally, are not subject to the same constraints of mass production, and these units have diverse and effective uniforms to choose from depending on the mission. Military units in other countries also base pattern choices on advanced tests of detectability.

The brutal reality of life and death on the battlefield has provided a sort of natural selection for military uniforms. Many versions are tried, some perform better than others, and patterns performing the best are (usually) selected for further use. Despite various hiccups along the way, few would disagree that modern uniforms are vastly improved over those worn in earlier wars. WWII uniforms were better than those of WW1, and uniforms today are better than those used in Korea or Vietnam.”

While it may be somewhat difficult to do the equivalent of evolutionary experimentation (such as testing camouflage patterns or letting mice hunt owls at night), there may be some value in attempts to find the “most fit” options through simulating a certain type of player’s outcomes over and over again. In the scenario of our type A and type B experiments, it would take a larger sample than five to know whether type A was truly better than type B if we were blindly thrown into the system. Experimentation, even though limited to simulation, would provide that larger sample for us.

Madden, for instance, has plenty of issues and would be an imperfect testing ground for these ideas. But suppose you could take something along the lines of Madden and insert 100 or even 1000 players of the same type and then simulate for their career outcomes? Of course there is a degree of fallibility based on the quality of the simulation, the quality of your inputs for a player type, and the general difficulty of predicting the league’s path over certain periods of time. However, tools like simulation may provide us with a window into the true range of possibilities for player outcomes. They give us a form of replayability like Lenski’s flasks.

Both simulation and being open-minded to system changes may also help to guard us against becoming deeply rooted to traditionalist ideas in the same way that bulldog breeders pushed back against the slimmer, fitter, “old type” bulldog. While artificial selection (selection of players by teams) may determine who gets to the field, natural selection (who wins and who loses) is going to determine what happens once the field is full. If we make suboptimal decisions in artificial selection, we will eventually be punished and outperformed in the long term by natural selection.

The end goal of our selection must always remain center stage and any move towards a certain type must reflect the pursuit of that goal. There may be some fanciful aesthetic beauty captured in the molding of a clay pot, but the potter stops getting paid if his pots can’t hold water.

Marble Racing, the Matthew Effect, and Genetic Drift

“For unto every one that hath shall be given, and he shall have abundance: but from him that hath not shall be taken away even that which he hath.”

- Matthew 25:29, The King James Version

“I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops.”

- Stephen Jay Gould, The Panda's Thumb: More Reflections in Natural History

Let’s race some marbles.

Each type of marble has two traits, one that reflects speed (fitness) and one that has no effect. Let’s assume that they are indicated by stripes. The blue stripe indicates higher speed. The other two stripes are yellow and green, but they have no influence on the speed of the marbles, they are purely aesthetic. A plain yellow or green stripe marble is equally unfavorable, as they are slow. A blue/yellow or blue/green stripe marble is equally favorable, as they are fast. Each race we will use different marbles of the same “type”, but they will share the same properties.

We will race our marbles by using a random number generator to determine the winner of any given race. We select 1 to 100. 1-90 selects a blue stripe marble and 91-100 selects a non-blue stripe marble, which reflects the higher odds of a blue marble winning the race. 1-45 represents the blue/green stripe and 46-90 represents the blue/yellow stripe. 91-95 represents a green stripe and 96-100 represents a yellow stripe.

We are going to race our marbles five times and see who wins, and then we will discuss some of the implications that come from the results.

Ready. Set. Race.

82. 50. 72. 80. 75.

All five race winners have blue/yellow stripes. Assume I ask you to tell me what happened and why these marbles won, but you don’t know the underlying meaning and distribution behind the stripes. If I asked you which marbles would win the next five races, which would you pick?

For the same reasons that the small sample size would have made it hard to tell the difference between type A and type B success earlier, you don’t know what exactly is driving the underlying outcome of winning the race. All you know is that the blue/yellow marbles won the five races and that they did so for some reason.

You don’t know if it’s blue or yellow influencing their success. You don’t know if it was a function of skill (as in you don’t know if blue/yellow is unbeatable) or luck (as in blue/yellow had a large degree of chance influence the streak of wins).

Let’s rerun the five races again as an example of another set of possible outcomes before we dive further into the specifics of the first set of races and their implications.

Ready. Set. Race.

79. 80. 20. 26. 88.

In these five races we would see that the blue stripe is more important than either the yellow or green stripe. We could also see a scenario where the green stripe dominates the yellow stripe and you begin to develop a preference for green initially, even though neither it (nor yellow) have any influence on the fitness.

Here is the interesting part. What if your knowledge and preferences from the first five races influenced the following races?

Suppose you only saw the first five races (the blue/yellow dominance). I decide to put these four marbles up for auction before racing them again. Which marble would draw the highest price from bidders? It would presumably be the blue/yellow marble, followed by the yellow and blue/green marbles, and then the green marble without a sniff of success coming in with the lowest demand. The market would want blue/yellow marbles.

Suppose I expand the race track and allow for a fifth marble to race. You get to choose which type of marble gets to race and your goal is to pick the marble with the best odds. Would you add a blue/yellow marble to the track? I won’t go through the probabilities or simulate the race here, but it’s safe to say that the blue/yellow marble type is about twice as likely to win the race as the blue/green marble if you add an additional track and blue/yellow marble. If it does, and if I expand the tracks again, then it’s likely that the next marble added will again be a blue/yellow, further expanding the initial advantage.

Alternatively, I could keep the tracks the same, but I replace one of the other marbles with a blue/yellow who is presumably going to be more successful in future races. Perhaps we eliminate the green marble first. No harm no foul, as it only had a 5% chance of winning any given race. We race again. Blue/yellow wins again. If we remove the yellow marble, we aren’t missing much. But if we remove the blue/green marble we would be eliminating the only real competitor from the system in favor of blue/yellow dominance.

It’s not through some function of blue/yellow superiority that it pulled away and succeeded in this scenario, but rather a result of cumulative advantage. This is somewhat like genetic drift, where a certain gene’s prevalence in the population can change and is sometimes completely eliminated due to factors like a small population increasing the impact of randomness.

Suppose we had 100 tracks for our marbles to fill, each type filling 25 tracks, rather than just the original four tracks with each type represented once. When we have fewer tracks, the random luck of whichever marble gets an initial advantage changes the likelihood of replacement. If we start with more tracks then poor performers (those without blue) will be gradually removed through lack of fitness, but those with blue will be less subject to randomness.

Another way to view this is through the lens of the Matthew effect, which is a term coined by Robert K. Merton in reference to a verse in the Gospel of Matthew. You may know of it by the more commonly said aphorism of “the rich get richer and the poor get poorer”. The principles of cumulative advantage exist in both genetic drift and the Matthew effect, although drift is more easily reflected through small population dynamics rather than an exponential upward trend. Think of the Matthew effect as adding new tracks with winning marbles while drift is more like replacement of existing tracks with winning marbles.

A great illustration of these principles of cumulative advantage is the Polya Urn model, which shows how randomness can dramatically influence the final outcome once it begins to take hold. To illustrate this, I’ll use two examples, each with three “replays of the tape”. Feel free to experiment with the model using various inputs though, as tinkering with the Polya Urn in different ways showcases the power of these self-reinforcing feedback loops within the system.

Take first, the urn which has 100 yellow and 100 green marbles, both of which are equally fit for the environment (like our blue/yellow and blue/green marble racers). We are going to pull a marble from the urn. Let’s suppose we pull a green marble. We would then put the marble back in the urn, along with an additional green marble. The urn now has 100 yellow marbles and 101 green marbles. We repeat the sampling again. We are going to do this 100 times and see what the distribution of marbles looks like.

Here are the three replays of the tape in this scenario of 100 yellow and 100 green marbles.

Think of it like this. Our odds of pulling a green marble when sitting at 100 to 100 are exactly 50%. We pull said green marble, then add it back along with an additional green marble. At 101 to 100, our odds shift just very slightly to 50.25% for the next pull to be green. If we pull a yellow marble, we shift back down to 50% as we move 101 to 101. If we pull a green marble again, our odds shift upwards again slightly to 50.50%. Any given pull has a limited impact on population split as a whole in this large marble sample. Even an incredible streak of green wouldn’t change the underlying odds that much.

Now let’s see what happens when we start with just one marble of each type and try the same urn exercise.

Remember, these marbles started with the exact same 50/50 odds entering the system, but based on the size (or sample) of the population of marbles, we see a much greater range of possible outcomes as a result. Once either particular color takes a significant advantage it begins to run away. In some cases the marbles manage to stay relatively competitive as the sample climbs, but any additional marble entering the urn further “locks in” the system. If we reach a point where there are 700 yellow marbles to 300 green marbles, the “lock in” is greater than if we are at 70 yellow marbles to 30 green marbles.

When we pull a green marble from the 1 to 1 urn, our odds of a green marble in the following pull go from 50% to 66.67%. If we pull another one, we hit 75%. So on and so forth in a manner that quickly leads to complete domination of one color over the other for no reason other than the randomness of the initial pull and the subsequent feedback of following pulls.

The difficult aspect to parsing through this from an observational standpoint (without knowledge of the underlying system), is that you are simply observing one color of marble getting run over by the other. It is not unlike the observation of the blue/green marble getting beat five times in a row in races by the blue/yellow marble. By the time you’ve reached the blue/yellow marble winning 90% of the time (because there are more tracks with blue/yellow marbles), you would start to assume that it is inherently better, which is why it is more prominent on the race track and why it consistently beats the blue/green marble. Lock in would make it hard to know otherwise. You could only see the difference between the two if you raced them head to head over and over again without the influence of the total population.

Here is a question. How closely correlated is height with the success of a quarterback at the NFL level?

If the selection results of NFL teams suggests anything, it’s pretty important historically. From 2013 to 2022, there were 748 quarterback prospects. Of these 748 quarterbacks, 537 were over 6’1” (71.70%). 667 were over 6’0” (89.05%).

Of the 90 quarterbacks drafted in the span, 78 were over 6’1” (86.67%) and 87 were over 6’0” (96.67%).

So we know that teams tend to draft mostly taller quarterbacks. But should they? Is it because taller quarterbacks are inherently that much better? If I pull any 6’0” or taller quarterback and throw him onto the field, is there a 97% chance he will outperform the 5’10” or 5’11” guy?

I stumbled across the sentence below while researching genetic drift and how it functions compared to natural selection.

Selection can only act on what variation is already in a population; it cannot create variation.

This is the most unsettling thing I have ever found in regards to deeply held beliefs about which players funnel from college into the NFL and why.

If the college level provides 100 quarterbacks and 10 of them are NFL caliber, all randomly distributed irrelevant to height, what are our odds that 8 or 9 of them will be taller than 6’0”?

Which is more likely - that height so significantly affects the ability of a quarterback that it leads to almost 90% of quarterback prospects being over 6’0” or that we’ve created a causal story to explain why tall quarterbacks are more prominent at the college level and by extension the NFL level? Are we punishing blue/green marbles for being less fit, or for simply not being the blue/yellow marbles with an established track record and a larger sample in the population?

When we think about that idea of selection acting on variation, not creating it, we can apply the lens to selection throughout high school, college, and NFL ranks. They shape the types of players that will progress from one level to the next.

Why are there so few white cornerbacks? Is it because the athletic profile, the mental ability, or the technical prowess needed to play the position are in any way influenced by skin color? Or is it because an athletic white kid at the high school level is more likely to play a different position because of the norms and imperfect selection processes of a coach who may be teaching algebra as his day job?

Let’s say he is placed at safety or wide receiver in high school, two positions with similar athletic requirements to cornerback. When colleges come through and recruit him, is it likely that they will want to change his position and teach him to play cornerback? What about when the NFL comes calling after his illustrious career at a power five school as a safety or wide receiver? At that point the time and effort needed to teach him a new position would make it a wasteful endeavor.

The amount of elbow nudging, “get a look at this” type comments about a player like recent draft pick and white cornerback Riley Moss may seem relatively harmless to some folks, but they reflect deeply ingrained social norms and expectations that are embedded in the selection process. They are the same norms that can lead certain coaches to put talented quarterbacks of color at other positions, which damages the overall quality of quarterback play if they are the most fit to play the position. From the NFL’s perspective these players aren’t observed, the same as our unknown 14th century genius from the last chapter who died from the Black Death or wasn’t part of nobility. They are swept under the rug, but they are every bit part of the system.

So not only have we discovered that we face significant historical contingency (what happened in the past of the system influences the present), but also that cumulative advantage can create a snowball effect that leads to vastly disproportionate outcomes that aren’t reflective of true underlying fitness. An experiment by Duncan Watts, Peter Dodds, and Matthew Salganik in 2006 captured the Matthew effect wonderfully.

They created Music Lab, a website that allowed participants to listen to, rate, and download songs from unknown bands, and it displayed how fickle these advantages can be. The participants were split into eight “social worlds” with their own independent rankings, along with two control groups that would have no rankings to rely on. Essentially, those in the control groups had no knowledge of what others were doing and browsed the record store alone. On the other hand, the social worlds would develop their own “popular songs” and then the forces of those early advantages would get a chance to strut their stuff.

The control groups were steady in their results, but the social worlds created vastly disparate outcomes. There were 48 songs in the experiment and many of them landed all over the board depending on early advantages. Highly ranked songs in the control group tended to do well, but there was no guarantee of success. One song in particular, Lockdown by 52metro, finished in first place in one world while finishing 40th in another. The song ranked 26th in the control groups.

Nobody listens to Elvis Presley or Queen and says to themselves, “I believe their popularity is largely a function of a self-reinforcing feedback loop. It’s just like preferential attachment in network theory!” They believe that these musicians are great because it makes intuitive sense that only a great musician could achieve such success, and if they weren’t that great then why is everyone listening to them? My friend listens to them, therefore I listen to them, therefore my other friends listen to them.

And while there may be some underlying characteristics of great musicians that warrant this popularity, some of it is simply being accessible and known to others. You listen to music on the radio and find songs through what others are listening to. You fall in love with a family recipe because it’s what the people you know and love are eating. You pick clothes based on how you want to represent yourself within a larger cultural zeitgeist. But if you were to rerun the tape, there’s no guarantee that Elvis would be Elvis, that one family recipe would continue to be handed down over another, or that bell bottom jeans would become a cultural icon in the 1970s.

Perhaps short quarterbacks are truly less fit than tall ones. Perhaps the prominence of shorter quarterbacks in the 2023 class is an anomaly that will soon revert the other way. We can’t know one way or another, just like we can’t know about all the other non-observations within the system that never cross our line of sight. We only see the ratio of marbles, not a “known” measure of comparative fitness. But it gives us reasons to question whether fitness is accurately reflected in the artificial selection processes of teams.

Depending on the definition of fitness, dog breeders who select the Cavalier King Charles spaniel for its aesthetic “cuteness” and miss the severe heart defects may be flawed in their judgment. They’ll tell you they know what they are doing and that they have expertise on the subject.

So would most coaches or scouts.

Link to part 1

Link to part 3

bottom of page