Sabermetrics: the mathematics of baseball
May 23, 2018
Money & Math
Sabermetrics is more than numbers on a page or numbers spoken about in Major League conference rooms. It is a window in which baseball fanatics, mathematicians, gamblers, Major League scouts or Major League professionals can see the same game from a fresh perspective: a mathematical perspective. Sabermetrics combines baseball’s past and present to project the future (Great Scott!). Sabermetrics also combines passion and expression with love for the game of baseball.
Sabermetrics is one of the hidden gems about baseball that I am head-over-heels about. Every year, the day after the conclusion of the World Series I bust out a 20 x 23-inch Post-It note and record all 30 teams’ free agents. I create a color-coded chart that is filled with information on each player, their teams, and contract details. I update the chart every time a player gets traded or offered a contract. Just before Opening Day, when the chart is complete, I post it on the door of my room and bask in its glory. My offseason trade chart is filled with specifically colored names, numbers, contracts, and of course, the sources I got the information from. An entire box of markers later, I continue my sabermetric obsession by organizing each team’s starting lineups and players. I compare batting averages, earned run averages, and slugging percentages. Taking unorganized statistics and logically organizing them is my artistic expression.
I set off on a journey to explore the core definition of sabermetrics and how teams playing at the Major League level to High School level use sabermetrics to alter the play of their games.
In 1999 the Oakland Athletics hired Paul DePodesta, a Harvard economics graduate, to be the Assistant General Manager to General Manager, Billy Beane. The Athletics had just finished the season prior failing to win a World Series against the Yankees. DePodesta, only having one prior job in baseball, pitched the idea of using solely statistics to build a championship winning team. Billy Beane was desperate and agreed to give the scouting departing a makeover and stray from traditional scouting. Prior to DePodesta mathematical idea, players were either drafted, traded, or acquired based on past performance and essentially how many home runs they could hit. Paul DePodesta taught Beane that runs win games and runs are achieved when men get on base; therefore, the statistic collum scouts must pay attention to is not home runs but on-base percentage. Beane took this idea to the scouting department where the old scouts disgruntledly agreed to comply with Beanies request to acquire men with exceptional on-base percentage no matter how old, or how many home runs (or lack thereof) there are. In this moment sabermetrics took the mainstream stage.
Sabermetrics never won Oakland a World Series ring, but it did take the team to the World Series multiple times. Other teams around the League picked up on Oakland’s unique front office strategy and hired Paul DePodestas of their own. The Boston Red Sox transformed their team using sabermetrics, eventually scoring them a World Series Championship in 2004.
The other twenty-eight teams followed Oakland and Boston and jumped on the sabermetric bandwagon. Since then sabermetrics have become mainstream and trickled down to the college level and even the high school level of play.
In my investigation, I explore the essential meaning and functionalities of sabermetrics as well as how it is carried out mathematically. Sabermetrics is a tactic in the athletic arena, a lifestyle in the lives of coaches, and a spicy controversy in the eyes of scouts.
Let’s play ball!
Show Me the Facts
Sabermetrics is the advanced statistical analysis of baseball. Its numerical analysis is used to forecast baseball players career and future state of play as well as build successful teams solely based on player’s statistics. Multi-million dollar professional baseball organizations began straying away from traditional scouting reports and implementing “sabermetrics” to trade, acquire, and draft lesser-known players to have a higher on-base percentage, leading to a higher win percentage. After reading Michael Lewis’s novel Moneyball, I became intrigued with the idea of using statistics as an advantage in a traditionally traditional game. Sabermetrics is more than just numbers used to predict the performance of players, it is the risky reality of some Major League front offices. I chose to research the use of sabermetrics in baseball due to its mysterious nature of using mathematics to potentially alter the outcome of a sporting event. Throughout publications surrounding sabermetrics, a common theme of using the data collected to” project rather than predict” was prominently displayed. The controversy between using sabermetrics as opposed to taking the traditional route of using scouting reports was largely accounted for through the various texts. This literature review will evaluate the common trends of analyzing raw data, errors in the system, and usage of sabermetrics in Major League clubs as well as sources coverage of controversial ‘tradition versus new age’ debate found in currently available sources surrounding the topic of sabermetrics in Major League Baseball.
This scene in the popular film, Moneyball, reenacts the conflict that arose when the Oakland Athletics’ General Manager, Billy Beane, wanted to transition his scouting department from traditional scouting to sabermetric-based scouting.
In this clip from Moneyball, Billy Beane attempts to persuade his scouts that sabermetrics is the new wave of scouting. He asks his assistant, Paul DePodesta to show them the math with hundreds of player statistic cards accompanied by his mathematics that show each players value to the team.
Manipulating Raw Data
Extracting data from a live baseball game to utilize it as evidence in a case for signing the next biggest baseball star is risky business. What if there are mistakes in calculations? What if a player trips on the sidewalk, leading to a career-ending injury? What if the statistics are nonsense? The Society for American Baseball Research (more commonly known as SABR), Lindy’s Sports Baseball 2018 Preview publication, as well as mathematicians, Gary Talsma and Jim Albert tackle the issue regarding raw data usage in sabermetrics. SABR is the main source for sabermetric information on the statistical analysis of baseball. Using sources such as, MLB.com, Baseball Reference, The Lahman Database, and Retrosheet.org, SABR reviews how these professional companies use data collected from baseball games, place them into sabermetric equations created by Bill James and other mathematicians to create data that could be used to compare players. Lahman created a database source that allows users to input their own raw data to achieve the output of sabermetric data. Baseball Reference and The Lahman Database use the method of displaying the equations and explaining how to input the raw data collected to encourage users to try sabermetrics at home. While Lindy’s Sports Baseball 2018 Preview, creates a 150-page magazine publication that displays the sabermetric information and analyzes it to create predictions for the upcoming season. The trend of teaching readers how to input data and manipulate it to achieve the desired outcome is apparent throughout these texts and elaborates by using the data to create predictions.
Errors in the System
Mathematics and statistics are susceptible to errors and errors are apparent even in the precise nature of sabermetrics. Stephen Marche of The New Yorker expands on the importance of the error statistic column in Major League Baseball in relation to the importance of errors in mathematics. A roundtable discussion on the question of sabermetrics’ impact on sports was featured in The Atlantic. The discussion surrounding the validity is apparent between both The New Yorker and The Atlantic. The Atlantic and The New Yorker articles stormed past explaining the functionalities of sabermetrics and jumped immediately into the nitty-gritty: discussing the reality of error within the mathematics. There is no case studies or graphs within these two articles, just sentences flowing with opinions. The discussion is important while determining the functionality of a new statistical advancement, discussing its errors, potential positive (and negative) impacts, and real-life function within the Major League setting. Open dialogue discussion surrounding the topic of sabermetrics and all of its flaws is an apparent trait in sabermetric-centered articles.
The Usage of Sabermetrics in Major League Organizations
Sabermetrics became popular in 2003 when Michael Lewis’s book Moneyball reporting on the 2001 Oakland Athletics success with using sabermetrics, dropped in bookstores across the United States. Ever since Paul DePodesta teamed up with the Oakland Athletics using Bill James’s sabermetric equations, teams around Major League Baseball have been grandfathering sabermetrics into front office decisions. Sources such as ESPN.com, Steve Dilbeck from The Los Angeles Times, and R.J Anderson from CBS Sports all report on similar instances revolving Major League teams utilizing sabermetrics in the front office. ESPN published an entire website analyzing the 2015 season in the four major sports (MLB, NBA, NHL, and NFL) on how much they utilize sabermetrics and how successful they were in the 2015 season. Their studies show that teams who utilized sabermetrics more than others were more likely to be more successful in postseason play than those who did not utilize sabermetrics as much as the other. As opposed to ESPN the two other sources reported in a more journalistic fashion on how the teams were using sabermetrics to their advantage during preseason play and drafting future professionals. Many sources took that same journalistic standpoint as The Los Angeles Times and CBS Sports tackled sabermetrics by, reporting what they witnessed and how the team’s actions affected the team’s success. The main theme overall 20 sources were the theme of how Major League Baseball teams used sabermetrics to their advantages, viewed at from a journalistic viewpoint.
The Baseball Almanac, and Baseball Prospectus introduce their publications with a disclaimer stating that the information in their magazine is not intended for predictions related to trading or gambling, rather their statistics are projections of how the statistics project that player or team should perform in the future. A similar warning arose in Lindy’s Sports Baseball 2018 Preview. It was explaining to users the precautions that should be taken before making any decisions based on advanced statistics. In the film (based on the novel) Moneyball, General Manager Billy Beane faced backlash from older scouts when he attempts to transition the club to a primarily sabermetrically driven front office. The scouts argue that baseball was created to be a traditional sport and it should be left in its traditional setting, definitely not risk becoming infiltrated by new age statistics. The argument about tradition versus new age statistics comes into play in The Atlantic’s roundtable discussion with half the table agreeing to continue traditional baseball while the other half is in support of sabermetrics advancements. The discussion continues and the table is open to anyone who opposes or supports sabermetrics, as most platforms allow for discussion within the comments that may later be translated into their own opinions. The controversy is there and the debate floor is open.
Overall, the literature presented on the topic of sabermetrics spans from journalism-style reporting to mathematical research papers on the functionality of sabermetrics in Major League Baseball but the information remains constant. Sabermetrics works in moderation and sabermetrics is controversial. Throughout my research I read from mathematical and business baseball professionals, further research may be conducting from the viewpoint of Major League Baseball players; the ones who are producing the raw data. Major agreements within my research included sabermetrics functionality and purpose to project and not predict. No major disagreements were present other than analysts taking the side of traditionality in baseball rather than use sabermetrics. In conclusion, the published research surrounding sabermetrics is advance and hold depth in discussing the functionality, errors, application, and workings of sabermetrics as well as holding sabermetrics accountable for further discussion.
It’s All a Numbers Game
Following my research about sabermetrics, I viewed many websites that had player’s statistics and decided to try to do some baseball math of my own. Below is a key to statistical acronyms and a layout of the Arizona Diamondbacks pitcher Zack Greinke’s statistics during the 2016, 2017, and 2018 seasons.
I specifically chose to demonstrate sabermetrics by using Zack Greinke’s statistics because he is one of the few professional baseball players that use sabermetrics to better their own game. Greinke watches film and does that math to craft a pitch list for each individual batter on the team he is facing. Researching what pitches they hit well, what they’re likely to swing on when the count is 3-2, or if that player has only hit home runs off of knuckle curveballs allows Greinke to pitch specifically to each player, ultimately increasing his probability of winning.
Statistics courtesy of FanGraphs
I’m not experienced enough in statistics to successfully calculate advanced sabermetrics although, I found a website that calculates wins above replacement (WAR) and other advanced statistics through Microsoft Excel. Senior Stephan Park is the Webb Schools varsity baseball leading offensive player.
Statistics courtesy of MaxPreps
Using his statistics from the 2018 season, courtesy of MaxPreps, I used Fansided.com’s war calculator to calculate Stephan Park’s WAR. The calculator is set to calculate statistics based on the Major League hitter’s schedule. Therefore, prior to imputing Stephan’s statistics I had to translate his high school season statistics to their Major League equivalent.
Calculating wins above replacement, commonly known as WAR, helps a team asses which players are the most vital for success. Wins above replacement sum up a players contribution to the team by analyzing their batting, baserunning, fielding, and pitching by using the calculation shown below. (Courtesy of Wikipedia) The higher the number produced, the more valuable the player is to the team in order to win. In my calculation with Stephen Park’s statistics, he received a WAR of 2.0. Meaning, The Webb Gauls would be projected to win two more games with Stephan playing rather than a replacement player (hence, wins above replacement).
To gain further insight into sabermetrics, I contacted experts in the field of mathematics, who also happen be baseball fans. Andrew Neyer is a statistics teacher at The Webb Schools and an avid Cincinnati Reds fan. The other voice on the tape is Stephen Caddy, a mathematics teacher at The Webb Schools, host of JOCKTALK, and Canadian sports fan. I spoke to them about sabermetrics, its function, its purpose, and if they would use sabermetrics.
In our conversation, Mr. Neyer and Mr. Caddy, and I discussed the flexibility of sabermetrics and in what fashion a team uses them, therefore affecting the impact of sabermetrics. We touch on how statistics are used in mathematics and how they are used in a similar manner in the world of baseball. And if they were General Managers, they would definitely use sabermetrics to gain an edge on the other team.
Rounding the Bases
Throughout the entire experience (the whole nine innings!) of researching and exploring sabermetrics, I gained a deeper understanding and appreciation for the game of baseball. I learned that sabermetrics is more elastic than I once thought, that teams may use sabermetrics for simply pitching and other times it’s their whole game plan.
My favorite line in Moneyball is when Billy Beane is speaking to Paul DePodesta about old baseball footage and he becomes nostalgic about his old playing days. He leans back and whispers, “How can you not be romantic about baseball?” He’s right. Prior to this adventure, I admired baseball but from a surface level. Now, I understand what Beane is saying. Baseball is intricate, there are so many small details and functioning that could only be experienced through research. My research in sabermetrics made me enamored with baseball. It’s a beautiful sport, how can you not be romantic about it?
The philosophy of sabermetrics is to find the diamonds in the rough in the world of baseball through organized statistics. Sabermetrics is as complex yet is as satisfying as watching a home run fly over the fence. The idea of projecting the future is captivating. And, the baseball world is barely scratching the surface.