3 min read

On what counts do batters swing?

Recently I’ve heard some things about when batters should swing. Many hitters take the first pitch of an at bat to get a feel for the pitcher. All hitters should take on 3-0.

I want to look at the rate at which batters swing based on the count.

library(magrittr)
library(ggplot2)
## Registered S3 methods overwritten by 'ggplot2':
##   method         from 
##   [.quosures     rlang
##   c.quosures     rlang
##   print.quosures rlang
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Load the data as d

d <- readRDS("MLB2018.rds")

We can use type to see whether the batter put the ball into play.

d$type %>% table
## .
##      B      S      X 
## 261526 333381 126283

To see what qualifies as S vs X:

d$description[d$type=="X"] %>% table
## .
##        hit_into_play hit_into_play_no_out  hit_into_play_score 
##                81802                28514                15967
d$description[d$type=="S"] %>% table
## .
##           called_strike                    foul               foul_bunt 
##                  121518                  125753                    1935 
##           foul_pitchout                foul_tip    hit_into_play_no_out 
##                       1                    6160                      12 
##             missed_bunt       swinging_pitchout         swinging_strike 
##                     402                       1                   71839 
## swinging_strike_blocked 
##                    5760
d$description[d$type=="B"] %>% table
## .
##         ball blocked_ball hit_by_pitch     pitchout 
##       242587        16931         1922           86

Now we will add a column for whether the batter swung or not. They swung if it wasn’t called a ball (B) or if it was an S with description called_strike.

d %<>% mutate(swung = !(type=="B" | description=="called_strike"))
table(d$swung, d$description)
##        
##           ball blocked_ball called_strike   foul foul_bunt foul_pitchout
##   FALSE 242587        16931        121518      0         0             0
##   TRUE       0            0             0 125753      1935             1
##        
##         foul_tip hit_by_pitch hit_into_play hit_into_play_no_out
##   FALSE        0         1922             0                    0
##   TRUE      6160            0         81802                28526
##        
##         hit_into_play_score missed_bunt pitchout swinging_pitchout
##   FALSE                   0           0       86                 0
##   TRUE                15967         402        0                 1
##        
##         swinging_strike swinging_strike_blocked
##   FALSE               0                       0
##   TRUE            71839                    5760

I’ll removed hit by pitch, but everything else looks okay.

d %<>% filter(description!= "hit_by_pitch")

Now we will add a variable for the count.

d %<>% mutate(count=paste(balls, strikes, sep='-'))
table(d$count)
## 
##    0-0    0-1    0-2    1-0    1-1    1-2    2-0    2-1    2-2    3-0 
## 184418  92088  46899  71985  73837  68847  24517  38495  59116   7552 
##    3-1    3-2    4-2 
##  16042  35464      8

I’ll remove the cases with 4 balls, not sure where they come from.

d %<>% filter(balls != 4) 

Now we can group by count and look at swing percentage.

d %>% group_by(count) %>% summarize(swingpct = sum(swung) / n())
## # A tibble: 12 x 2
##    count swingpct
##    <chr>    <dbl>
##  1 0-0      0.290
##  2 0-1      0.475
##  3 0-2      0.514
##  4 1-0      0.422
##  5 1-1      0.536
##  6 1-2      0.578
##  7 2-0      0.432
##  8 2-1      0.585
##  9 2-2      0.651
## 10 3-0      0.108
## 11 3-1      0.559
## 12 3-2      0.724

As expected, the swing rate is lowest on 3-0, followed by 0-0. It is highest on 3-2 by a significant margin, but I don’t have any good intuition on why this should by higher than 2-2, 1-2, or 0-2.