Journal of Quantitative Analysis in Sports Copyright (c) 2009 University of California, San Francisco All rights reserved. http://www.bepress.com/jqas Recent documents in Journal of Quantitative Analysis in Sports en-us Sun, 03 May 2009 07:02:10 PDT 3600 Pythagoras and the National Hockey League http://www.bepress.com/jqas/vol5/iss2/11 http://www.bepress.com/jqas/vol5/iss2/11 Fri, 01 May 2009 11:47:41 PDT The nature of the relationship Bill James found between the win/loss percentage of a Major League Baseball team and the number of runs the team scores and allows over the course of a season is investigated for the National Hockey League (NHL). We find the optimal form of James' model for the NHL using the absolute error criterion and demonstrate that far more complex forms of James' model yield little in additional predictive power. We also provide empirical evidence that the relationship between win/loss percentage and goals scored and allowed varies relatively little across recent seasons. James J. Cochran Hockey Scramble Teams for the Pinehurst Terrapin Classic http://www.bepress.com/jqas/vol5/iss2/10 http://www.bepress.com/jqas/vol5/iss2/10 Fri, 01 May 2009 11:47:39 PDT The Pinehurst Terrapin Classic (PTC) is a five day, 16 player, annual golf tournament that includes match-play and scramble rounds. Pairings must be created for each day of the tournament that ensure each pair of participants play on the same team at least once during the tournament, and ensure that the teams for each scramble round are of comparable ability. Over the past four years, both a simple interactive, spreadsheet-based heuristic and an integer programming model have been developed to create the pairings. These both rely on very special properties of the structure of this tournament. In this paper, we describe this structure, the methods used to create the pairings and the experience with the use of these methods over the most recent years. Michael O. Ball Golf A New Handicapping System for Golf http://www.bepress.com/jqas/vol5/iss2/9 http://www.bepress.com/jqas/vol5/iss2/9 Fri, 01 May 2009 11:47:37 PDT The official handicapping system of the Royal Canadian Golf Association (RCGA) is very similar to the handicapping system of the United States Golf Association (USGA). Although these handicapping systems are complex and have been carefully studied, the systems do not take statistical theory into account. In 2000, the Handicap Research Committee of the RCGA was formed and challenged with the task of developing a new handicapping system. This paper outlines the proposed system. The proposed system continues to make use of the existing course ratings and slope ratings, but uses statistical theory to drive the methodology. In this paper, we demonstrate that the proposed system has several advantages over existing systems including fairness and improved interpretability. The proposed system is supported by both theory and data analyses. An investigation into the effects of equitable stroke control is also provided. Tim B. Swartz Golf Using Simulation to Estimate the Impact of Baserunning Ability in Baseball http://www.bepress.com/jqas/vol5/iss2/8 http://www.bepress.com/jqas/vol5/iss2/8 Fri, 01 May 2009 11:47:35 PDT In baseball, an offensive team's run scoring ability is dependent not only upon the batting skills of its players, but also their baserunning abilities. Using a Monte Carlo simulation based on actual statistics of real players, we estimate the magnitude of the effect of baserunning skills upon a team's run scoring ability. Our results largely confirm previous non-academic estimates that the impact of baserunning upon a team's run scoring ability is typically less than ±25 runs per season. However, we show using simple heuristic algorithms, that a team composed of the best (worst) nine baserunners could gain (lose) as many as 70 (55), runs per season due to baserunning. Ben S. Baumer Baseball Keeping the Hitter Off Balance: Mixed Strategies in Baseball http://www.bepress.com/jqas/vol5/iss2/7 http://www.bepress.com/jqas/vol5/iss2/7 Fri, 01 May 2009 11:47:33 PDT Mixed strategies are a key component of game theory. Investigations into whether or not people use optimal mixed strategies have largely been limited to laboratory settings and have produced mixed results. Recently, the empirical framework has been extended into professional sports. This study uses pitch-level data from Major League Baseball games to see if pitchers mix their pitches optimally. The scope of this study is limited to the first pitch of a plate appearance and finds that pitchers are mixing optimally to have success on the first pitch of the plate appearance, but the null hypothesis of optimal play for the plate appearance outcome is rejected. Jesse Weinstein-Gould Baseball `If the Team Doesn't Win, Nobody Wins:' A Team-Level Analysis of Pay and Performance Relationships in Major League Baseball http://www.bepress.com/jqas/vol5/iss2/6 http://www.bepress.com/jqas/vol5/iss2/6 Fri, 01 May 2009 11:47:31 PDT This analysis of team-level major league baseball performance, for the 1985 through 2001 seasons, addresses four questions: (1) ‘Is there a relationship between winning and performance?' (2) ‘Is there a relationship between pay and performance?' (3) ‘Is there a relationship between winning and pay?' and (4) ‘Is there interaction between batting and pitching?' The findings are that: (1) the relationship between performance and winning is significant. Pitching explains 2/3 of the variance, with batting covering the other 1/3; (2) the pay and performance relationship is significant, but the practical importance of the relationships is low, because non-performance factors exert stronger influence on pay levels; (3) the pay and winning relationship is significant, but becomes non-significant when performance variables are used to predict winning; and (4) the batting and pitching interaction is significant, but weak, with limited effects. This type of analysis should help teams be managed more effectively than may presently be the case. Nicholas S. Miceli Baseball Modeling Baseball Player Ability with a Nested Dirichlet Distribution http://www.bepress.com/jqas/vol5/iss2/5 http://www.bepress.com/jqas/vol5/iss2/5 Fri, 01 May 2009 11:47:28 PDT In this paper we introduce the nested Dirichlet probability distribution and propose a method of using it to model Major League Baseball (MLB) player abilities. To do so, we define fourteen distinct outcome types for any typical plate appearance (excluding intentional walks and bunt attempts), and we assume that every player has an underlying fourteen dimensional ability vector, x, where each element represents the probability that the player will experience the corresponding outcome type in any typical plate appearance. We then use the method of maximum likelihood to fit a nested Dirichlet joint prior distribution on x for all MLB batters (excluding pitchers) over the period from 2003-2006.As the nested Dirichlet (like the Dirichlet distribution) is conjugate prior to multinomial data, this model yields a nested Dirichlet posterior distribution for all players as well. We also present extensions to incorporate age effects and year-to-year variance in player underlying abilities to improve the model's predictive power while maintaining a nested Dirichlet posterior leading to surprising new evidence that the underlying abilities of players (not just their statistical performances) are mean-reverting in some sense. We evaluate the posteriors generated by this extended model as a forecasting tool versus future results, showing that the model's accuracy is competitive with popular projection systems, and that the model demonstrates a reasonable estimate of posterior uncertainty. Finally, we discuss further ideas for extending the model as well as some key applications. Brad Null Baseball Chasing DiMaggio: Streaks in Simulated Seasons Using Non-Constant At-Bats http://www.bepress.com/jqas/vol5/iss2/4 http://www.bepress.com/jqas/vol5/iss2/4 Fri, 01 May 2009 11:47:26 PDT On March 30, 2008, Samuel Arbesman and Steven Strogatz had their article "A Journey to Baseball's Alternate Universe" published in The New York Times. They simulated baseball's entire history 10,000 times to ask how likely it was for anyone in baseball history to achieve a streak that is at least as long as Joe DiMaggio's hitting streak of 56 in 1941. Arbesman and Strogatz treated a player's at bats per game as a constant across all games in a season, which greatly overestimates the probability of long streaks. The simulations in this paper treated at-bats in a game as a random variable. For each player in each season, the number of at-bats for each simulated game was bootstrapped. The number of hits for player i in season j in game k is a binomial random variable with the number of trials being equal to the number of at bats the player gets in game k and the probability of success being equal to that player's batting average for that season. The result of using non-constant at-bats in the simulation was a decrease in the percentage of the baseball histories to see a hitting streak of at least 56 games from 42% (Arbesman and Strogatz) to approximately 2.5%. David M. Rockoff Baseball Assessing Methods for College Football Rankings http://www.bepress.com/jqas/vol5/iss2/3 http://www.bepress.com/jqas/vol5/iss2/3 Fri, 01 May 2009 11:47:24 PDT With the advent of the Bowl Championship Series (BCS) much emphasis has been placed on ranking teams. We consider several mathematical methods for ranking college football teams based on point differential including least squares with fixed or mixed effects. We also consider the use of modifications such as truncating or censoring (such as Harville's method) the result to adjust for the possibility of teams running up the score. We assess the predictive performance of these models using leave-one-out cross validation. The methods and analyses are applied to all major NCAA football data from 1930-2007. Ryan Gill Football Optimizing Football Game Play Calling http://www.bepress.com/jqas/vol5/iss2/2 http://www.bepress.com/jqas/vol5/iss2/2 Fri, 01 May 2009 11:47:21 PDT Play calling strategies during football games are extremely important to the success of a team. In the past, coaches and players have subjectively determined the plays to call based on past experiences, personal biases, and various observable factors. This research quantifies these decisions using game theoretic techniques; updating optimal decision policies as new information becomes available during a game. A decision maker changes his perceived optimal strategy based on the information known about the opponent's strategy at the time of the decision. Additionally, utility theory is used to capture the different risk preferences of the decision makers. Furthermore, we use design of experiments and response surface methodology to optimize the risk strategies of each decision maker. By exploring the interaction of two football teams' risk preferences, optimal risk strategies can be suggested in the form of a varying mixed strategy. The techniques presented can be utilized in a precursory analysis to forecast different decisions a coach or player may encounter throughout the game, during a game to optimize each play called, or as a posterior analysis technique to dissect the decisions made and determine the effectiveness of the plays called. The procedures are easily transitioned to rapidly assist football teams or other sports teams in making better decisions through quantitative modeling and statistical analysis. A numerical example is presented to demonstrate the usefulness of the solution approach. Jeremy D. Jordan Football A Statistical Analysis of NFL Quarterback Rating Variables http://www.bepress.com/jqas/vol5/iss2/1 http://www.bepress.com/jqas/vol5/iss2/1 Fri, 01 May 2009 11:47:19 PDT Using data from NFL seasons 1960-2007, we examine the quarterback rating and the four variables of which it consists: average yards per attempt, completion percentage, interception percentage, and touchdown percentage. We test for structural breaks in the means and standard deviations of each variable. The analysis finds evidence that there are structural breaks in the series likely associated with rule changes designed to promote the passing game and the implementation of the salary cap. The break test results as a whole suggest that comparisons of quarterbacks from different regimes are inappropriate unless the regime differences are taken into account. There appears to have been a simultaneous improvement in quarterback performance and reduction in volatility suggestive of the idea that the relative difference between above average and average quarterbacks has been reduced. Using graph theory and the information gleamed from structural break tests, we examine the causal relationships among the four quarterback rating variables over the most recent stable period, which is 2000-2007. The causal analysis shows that completion percentage is commonly caused by interception percentage and average yards per attempt over the course of a season. Also, touchdown percentage causes average yards per attempt. We suggest possible explanations of the findings and suggest avenues for future research. Derek Stimel Football