Journal of Quantitative Analysis in Sports Copyright (c) 2008 University of California, San Francisco All rights reserved. http://www.bepress.com/jqas Recent documents in Journal of Quantitative Analysis in Sports en-us Wed, 30 Apr 2008 02:38:20 PDT 3600 Probability Formulas and Statistical Analysis in Tennis http://www.bepress.com/jqas/vol4/iss2/15 http://www.bepress.com/jqas/vol4/iss2/15 Mon, 28 Apr 2008 09:33:29 PDT In this paper an expression for the probability of winning a game in a tennis match is derived under the assumption that the outcome of each point is identically and independently distributed. Important properties of the formula are evaluated and presented pictorially. The accuracy of this formula is tested by comparing observed proportions against predicted values using data from the 2007 Wimbledon Tennis Championships. We also derive expressions for the probability of several other milestones in a tennis match including winning a tiebreaker, winning a set, winning a match, and recovering from a break of serve down to win a set. The resulting "tennis formulas" are used to evaluate the implications of possible rule changes, to demonstrate how broadcasts of tennis matches could be made more interesting and informative, and to potentially improve a player's chance of winning a match. A. James O'Malley Tennis Skill Evaluation in Women's Volleyball http://www.bepress.com/jqas/vol4/iss2/14 http://www.bepress.com/jqas/vol4/iss2/14 Mon, 28 Apr 2008 09:33:25 PDT The Brigham Young University Women's Volleyball Team recorded and rated all skills (pass, set, attack, etc.) and recorded rally outcomes (point for BYU, rally continues, point for opponent) for the entire 2006 home volleyball season. Only sequences of events occurring on BYU's side of the net were considered. Events followed one of these general patterns: serve-outcome, pass-set-attack-outcome, or block-dig-set-attack-outcome. These sequences of events were assumed to be first-order Markov chains where the quality of each contact depended only on the quality of the previous contact but not explicitly on contacts further removed in the sequence. We represented these sequences in an extensive matrix of transition probabilities where the elements of the matrix were the probabilities of moving from one state to another. Each row of the count matrix, consisting of the number of times play moved from one transition state to another during the season, was assumed to have a multinomial distribution. A Dirichlet prior was formulated for each row, so posterior estimates of the transition probabilities were then available using Gibbs sampling. The different paths in the transition probability matrix were followed through the possible sequences of events at each step of the MCMC process to compute the posterior probability density that a perfect pass results in a point, a perfect set results in a point, etc. These posterior probability densities are used to address questions about skill performance in BYU Women's Volleyball. Lindsay W. Florence Other Sport Composite Poisson Models for Goal Scoring http://www.bepress.com/jqas/vol4/iss2/13 http://www.bepress.com/jqas/vol4/iss2/13 Mon, 28 Apr 2008 09:33:21 PDT Goal scoring in sports such as hockey and soccer is often modeled as a Poisson process. We work with a Poisson model where the mean goals scored by the home team is the sum of parameters for the home team's offense, the road team's defense, and a home advantage. The mean goals for the road team is the sum of parameters for the road team's offense and for the home team's defense. The best teams have a large offensive parameter value and a small defensive parameter value. A level-2 model connects the offensive and defensive parameters for the k teams. Parameter inference is made by imagining that goals can be classified as being strictly due to offense, to (lack of) defense, or to home-field advantage. Though not a realistic description, such a breakdown is consistent with our model assumptions and the literature, and we can work out the conditional distributions and generate random partitions to facilitate inference about the team parameters. We use the conditional Binomial distribution, given the Poisson totals and the current parameter values, to partition each observed goal total at each iteration in an MCMC algorithm. Phil Everson Hockey Other Sport Improving Golf Instruction with the iClub Motion Capture Technology http://www.bepress.com/jqas/vol4/iss2/12 http://www.bepress.com/jqas/vol4/iss2/12 Mon, 28 Apr 2008 09:33:18 PDT A new 3D motion capture technology is changing Golf instruction and research, bringing us closer than ever to understanding the complex yet coordinated motions involved in a successful Golf swing. This paper highlights a study undertaken by the Golf Advantage School at Pinehurst using this new technology. Two separate drills were examined for how effective they were at increasing the total Hip Rotation in the backswing of normal Golfers. These two drills were the ``Feet Together" drill and the ``Right Foot Back" drill. Using equipment provided by iClub(TM) Inc. and performing the statistical analysis with StatXact® 8 software, we show that the ``Feet Together" drill increased Hip Rotation by 2.5 degrees (with a 95% confidence interval from -0.5 to 5.33 degrees), and that the ``Right Foot Back" drill increased Hip Rotation by 4.2 degrees (with a 95% confidence interval from 0.84 to 8.33 degrees). A pooled analysis of the data from both drills, stratified by type of drill, yielded a statistically significant effect (p = 0.00295, 2-sided). The results also suggest that the ``Right Foot Back" drill, which was initially developed as an easier alternative to the ``Feet Together" drill, may actually also be more effective . These scientifically supported insights are indicative of the power of the new motion capture technology to generate previously unattainable data for improving athletic performance. Arun M. Mehta Golf Probability and Statistical Models for Racing http://www.bepress.com/jqas/vol4/iss2/11 http://www.bepress.com/jqas/vol4/iss2/11 Mon, 28 Apr 2008 09:33:14 PDT Racing data provides a rich source of analysis for quantitative researchers to study multi-entry competitions. This paper first explores statistical modeling to investigate the favorite-longshot betting bias using world-wide horse race data. The result shows that the bias phenomenon is not universal. Economic interpretation using utility theory will also be provided. Additionally, previous literature have proposed various probability distributions to model racing running time in order to estimate higher order probabilities such as probabilities of finishing second and third. We extend the normal distribution assumption to include certain correlation and variance structure and apply the extended model to actual data. While horse race data is used in this paper, the methodologies can be applied to other types of racing data such as cars and dogs. Victor S. Lo Other Sport Isolating the Effect of Individual Linemen on the Passing Game in the National Football League http://www.bepress.com/jqas/vol4/iss2/10 http://www.bepress.com/jqas/vol4/iss2/10 Mon, 28 Apr 2008 09:33:09 PDT Protecting the quarterback is an integral part of the passing game in the National Football league, yet the relationship between the abilities of an individual lineman and the effectiveness of a passing game remains unexplored. One of the principal reasons for this lack of study is the absence of publicly available data that is needed in order to track the performance of a specific lineman. In order to create the relevant data, the first 3 games of the 2007 NFL season for seven different teams were charted. The performance of each lineman was recorded on every pass play, as well as the amount of undisturbed time the quarterback was given (time in the pocket) to make a throw. These data were used in a series of regressions to determine how likely a lineman was to successfully hold his block in relation to the time it took for the quarterback to throw the ball, for each lineman in the sample. These data were also used to estimate the correlation between successful blocking and completion rate. The results of these regressions were then used to simulate the effects that different linemen have on the passing game. The trade in the offseason between the New York Jets and Washington Redskins which sent left guard Pete Kendall to Washington was examined. The analysis finds that the Jets lost approximately 3 percentage points on their completion rate due to the trade. Benjamin C. Alamar Football The Passing Premium Puzzle Revisited http://www.bepress.com/jqas/vol4/iss2/9 http://www.bepress.com/jqas/vol4/iss2/9 Mon, 28 Apr 2008 09:33:06 PDT The passing premium puzzle states that NFL teams do not call enough passing plays, despite rule changes since the late 1970's that have increased the expected return to passing. This paper develops a simple portfolio model to determine how a coach could determine an optimal share of running and passing plays to maximize the expected yardage return from the portfolio. Coaches are assumed to be risk-averse so that they perceive a tradeoff between a higher expected return to passing and running, and a higher variance of yardage to each. The model is tested by computing the optimal share of running plays and comparing to the actual share of running plays for the 2006 NFL season. It is also demonstrated that a tradeoff does exist between expected yardage return and risk which is the basis for the portfolio model. Finally, portfolio selection is shown to, at least partly, determine winning percentage. Duane W. Rockerbie Football A Simple and Flexible Rating Method for Predicting Success in the NCAA Basketball Tournament: Updated Results from 2007 http://www.bepress.com/jqas/vol4/iss2/8 http://www.bepress.com/jqas/vol4/iss2/8 Mon, 28 Apr 2008 09:33:02 PDT This paper first presents a brief review of potential rating tools and methods for predicting success in the NCAA basketball tournament, including those methods (such as the Ratings Percentage Index, or RPI) that receive a great deal of weight in selecting and seeding teams for the tournament. The paper then proposes a simple and flexible rating method based on ordinal logistic regression and expectation (the OLRE method) that is designed to predict success for those teams selected to participate in the NCAA tournament. A simulation based on the parametric Bradley-Terry model for paired comparisons is used to demonstrate the ability of the computationally simple OLRE method to predict success in the tournament, using actual NCAA tournament data from 2006 and 2007. Given that the proposed method can incorporate several different predictors of success in the NCAA tournament when calculating a rating, and is shown to have better predictive power than a model-based approach, it should be considered as an alternative to other rating methods currently used to assign seeds and regions to the teams selected to play in the tournament. The predictive power of the model-based simulation approach is also discussed, given the success of this approach in 2007. The paper concludes with limitations and directions for future work in this area. Brady T. West Basketball Racial Bias in the NBA: Implications in Betting Markets http://www.bepress.com/jqas/vol4/iss2/7 http://www.bepress.com/jqas/vol4/iss2/7 Mon, 28 Apr 2008 09:32:58 PDT Recent studies have documented the existence of an own-race bias on the part of sports officials. In this paper we explore the implications of these biases on betting markets. We use data from the 1991/92 - 2004/05 NBA regular seasons to show that a betting strategy exploiting own-race biases by referees would systematically beat the spread. Tim Larsen Basketball The Role of Rest in the NBA Home-Court Advantage http://www.bepress.com/jqas/vol4/iss2/6 http://www.bepress.com/jqas/vol4/iss2/6 Mon, 28 Apr 2008 09:32:55 PDT To date, the factors which lead to the very large home court advantage characteristic of the NBA have not yet been well isolated. This study analyzes the relationship between that home court advantage and the comparatively fewer days of rest between games that the NBA schedule imposes on visiting teams. A statistical model has been developed and applied to the NBA data for the 2004-2005 and 2005-2006 seasons to estimate the importance of the effect of rest on the magnitude of the home court advantage. The results indicate that lack of rest for the road team, while not a dominant factor, is an important contributor to the home court advantage in the NBA. Oliver A. Entine Basketball In Search of the "Last-Ups" Advantage in Baseball: A Game-Theoretic Approach http://www.bepress.com/jqas/vol4/iss2/5 http://www.bepress.com/jqas/vol4/iss2/5 Mon, 28 Apr 2008 09:32:52 PDT Received wisdom in baseball takes it as a given that it is an advantage have the last turn at bat in a baseball game. This belief is supported, implicitly or explicitly, by an argument that the team on offense benefits by knowing with certainty the number of runs it must score in the final inning. Because the discrete nature of plays in baseball lends itself naturally to a model of a baseball contest as a zero-sum Markov game, this hypothesis can be tested formally. In a model where teams may employ the bunt, stolen base, and intentional walk, there is no significant quantitative advantage conferred by the order in which teams bat, and in some cases batting first may be of slight advantage. In practice, the answer to the question may be determined by actions more subtle than previously considered, such as the extent to which the defensive team can influence the distribution of run-scoring by pitch selection or fielder positioning. Theodore L. Turocy Baseball Improving Major League Baseball Park Factor Estimates http://www.bepress.com/jqas/vol4/iss2/4 http://www.bepress.com/jqas/vol4/iss2/4 Mon, 28 Apr 2008 09:32:46 PDT The study of Park Factors (PF) is essential to the correct evaluation of player performance in Major League Baseball. We have identified two important problems with the commonly used formula which has been popularized by ESPN: it produces variable results due to unbalanced scheduling, and it has an inherent inflationary bias. To address these problems, we develop a new estimator for Park Factors using an ANOVA weighted fixed-effects model for run generation. Using simulated data, in addition to run data from 2000 through 2006, we show that this new estimator does not have the biases of the old estimator. From a strategic viewpoint, accurate PF values are needed to properly evaluate free agents and trade proposals, as well as to compare players for postseason awards. We develop a method to adjust statistics using Park Factors called a Neutral Park Adjustment (NPA), which takes into account the Park Factors of the entire schedule of a player, not simply their home park. Rohit A. Acharya Baseball Why On-Base Percentage is a Better Indicator of Future Performance than Batting Average: An Algebraic Proof http://www.bepress.com/jqas/vol4/iss2/3 http://www.bepress.com/jqas/vol4/iss2/3 Mon, 28 Apr 2008 09:32:43 PDT Batting Average (AVG) and On-Base Percentage (OBP) are two of the most commonly cited statistics in baseball. Existing research has demonstrated that for a team, OBP is more closely correlated to runs scored than is AVG, and secondly, for players, OBP is more closely correlated over time than is AVG. We offer an algebraic explanation for the latter phenomenon. Specifically, we will prove that batting average depends more heavily upon a particularly unpredictable variable, hits per balls in play (HPBP), than does OBP. This result will explain why for both batters and pitchers, on-base percentage is a better indicator of future performance than batting average. Ben S. Baumer Baseball Estimating Situational Effects on OPS http://www.bepress.com/jqas/vol4/iss2/2 http://www.bepress.com/jqas/vol4/iss2/2 Mon, 28 Apr 2008 09:32:40 PDT `What is the offensive value of Player A?´ Of all the metrics that sabermetricians have developed to attempt to answer that question, OPS (on base percentage plus slugging percentage) has been one of the first for the mainstream media to slowly embrace as an alternative to batting average. Looking at statistics for each team on ESPN.com, one sees that the batting statistics are sorted by OPS as the default sort. What if the question asked was `What is the offensive value of Player A in Situation B versus Situation C?´ In 1994, Jim Albert used the Gibbs sampler to estimate the effect different in-game situations had on batting average. One example of such a situation is a player's breakdown statistics in home and away games. By employing the Gibbs sampler on each component of OPS, one can compute the situational effect on a player's OPS. The data will consist of the hitting performance of major league regulars during the 2006 season who qualified for the batting title. Part of the appeal of OPS is that it is simpler to calculate than other more complicated metrics developed by sabermetricians; however, the raw value of OPS does have limitations such as not taking into consideration ballpark effects or the differences between the two leagues. Philip A. Yates Baseball New England Symposium on Statistics in Sports http://www.bepress.com/jqas/vol4/iss2/1 http://www.bepress.com/jqas/vol4/iss2/1 Mon, 28 Apr 2008 09:32:36 PDT The organizers of the 2007 New England Symposium on Statistics in Sports proudly introduce an issue of JQAS focused on papers presented at the conference. Scott Evans