25 min read

Expected runs for every game state

A state in a baseball game is defined by the number outs and which bases have baserunners. We could include the count, inning, etc, but we’re starting with the simplest version. For a given state, there is an expected number of runs a team would score in the remainder of that inning. These tables are readily available online. I want to try to recreate the tables using data from mlbgameday package. It will require joining some tables and may be challenging.

library(mlbgameday)
library(magrittr)
library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Get data.

# Takes hours to get all the data for the year
dat <- get_payload(start = "2018-01-01", end = "2018-12-31")

The data for each at bat doesn’t include where the bases runners start and end, the pitch data has where the runners start

dat$pitch %>% head
##               des           des_es id type    tfs             tfs_zulu
## 1            Ball        Bola mala  3    B 200949 2018-02-21T20:09:49Z
## 2            Ball        Bola mala  4    B 201003 2018-02-21T20:10:03Z
## 3            Ball        Bola mala  5    B 201017 2018-02-21T20:10:17Z
## 4   Called Strike   Strike cantado  6    S 201033 2018-02-21T20:10:33Z
## 5 Swinging Strike Strike tirándole  7    S 201047 2018-02-21T20:10:47Z
## 6 In play, out(s) En juego, out(s)  8    X 201104 2018-02-21T20:11:04Z
##        x                    y         sv_id start_speed end_speed   sz_top
## 1  71.59 2018-02-21T20:09:49Z 180221_200949        91.0      81.3 3.506856
## 2 103.27 2018-02-21T20:10:03Z 180221_201003        91.2      82.0 3.568236
## 3  91.60 2018-02-21T20:10:17Z 180221_201017        90.6      80.9 3.472593
## 4 119.20 2018-02-21T20:10:33Z 180221_201033        91.5      81.9 3.479696
## 5  87.61 2018-02-21T20:10:47Z 180221_201047        92.4      82.1 3.691824
## 6 154.65 2018-02-21T20:11:04Z 180221_201104        93.9      84.7 3.691824
##     sz_bot      pfx_x     pfx_z         px       pz        x0 y0       z0
## 1 1.657627 -1.1335109 12.205505  1.1915198 4.022879 -1.357341 50 6.134092
## 2 1.603569 -1.1255583 11.926857  0.3602895 3.753873 -1.448721 50 6.055660
## 3 1.539733 -1.4003827 12.924600  0.6666075 3.220051 -1.387051 50 6.057518
## 4 1.551208 -5.4802387  9.984281 -0.0577412 2.803866 -1.541164 50 5.965382
## 5 1.429535 -0.6695584 12.734720  0.7708471 3.111694 -1.411845 50 6.019451
## 6 1.429535 -5.6703425 10.369535 -0.9877447 2.142914 -1.501145 50 5.937560
##        vx0       vy0       vz0         ax       ay         az break_y
## 1 6.998528 -132.1595 -3.391877  -1.980667 31.32489 -10.846471    23.7
## 2 5.103719 -132.6797 -3.885679  -1.990105 30.53986 -11.086111    23.7
## 3 5.773281 -131.4931 -5.428116  -2.422168 31.03215  -9.819048    23.7
## 4 5.726707 -132.8034 -5.467920  -9.682306 31.28585 -14.534148    23.7
## 5 5.962360 -134.0980 -5.849144  -1.197000 33.92179  -9.407615    23.7
## 6 3.344477 -136.4020 -7.761765 -10.550289 33.48019 -12.880400    23.7
##   break_angle break_length pitch_type type_confidence zone nasty spin_dir
## 1         1.7          2.7         FF           0.911   12    58  185.304
## 2         4.0          2.8         FF           0.915   12    46  185.389
## 3         5.6          2.6         FF           0.893    3    49  186.182
## 4        27.9          4.1         FF           0.904    5    20  208.761
## 5        -1.5          2.5         FF           0.917   12    48  183.008
## 6        33.5          3.9         FF           0.893   13    49  208.670
##   spin_rate cc mt
## 1  2351.306      
## 2  2316.095      
## 3  2480.889      
## 4  2198.229      
## 5  2466.456      
## 6  2338.852      
##                                                                                                                     url
## 1 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 2 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 3 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 4 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 5 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 6 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
##   inning_side inning next_ num on_1b on_2b on_3b count
## 1         top      1     Y   1    NA    NA    NA   0-0
## 2         top      1     Y   1    NA    NA    NA   1-0
## 3         top      1     Y   1    NA    NA    NA   2-0
## 4         top      1     Y   1    NA    NA    NA   3-0
## 5         top      1     Y   1    NA    NA    NA   3-1
## 6         top      1     Y   1    NA    NA    NA   3-2
##                     gameday_link code event_num
## 1 gid_2018_02_21_asubbc_arimlb_1    B         3
## 2 gid_2018_02_21_asubbc_arimlb_1    B         4
## 3 gid_2018_02_21_asubbc_arimlb_1    B         5
## 4 gid_2018_02_21_asubbc_arimlb_1    C         6
## 5 gid_2018_02_21_asubbc_arimlb_1    S         7
## 6 gid_2018_02_21_asubbc_arimlb_1    X         8
##                              play_guid
## 1 ae276dd4-116a-4c66-b9f6-652959a7ed25
## 2 e9c938f2-bd89-455e-852b-e7a143c029df
## 3 b92cc24e-2657-4e23-ae9b-611aee8b3d0d
## 4 259ce2d0-0d67-45ac-8c7d-cd56d9d65a9f
## 5 10f9568e-cb4e-4dc9-83af-61833eb14da1
## 6 622eacfe-c74e-4157-b3b0-d1fd0ca579fc

We’re going to have to use runner to get the progress of base runners.

dat$runner %>% head
##       id start end      event score  rbi earned
## 1 666183        1B       Walk  <NA> <NA>   <NA>
## 2 666183    1B     Pickoff 1B  <NA> <NA>   <NA>
## 3 656637        1B     Single  <NA> <NA>   <NA>
## 4 656637    1B  2B     Single  <NA> <NA>   <NA>
## 5 679523        1B     Single  <NA> <NA>   <NA>
## 6 679523    1B  2B     Single  <NA> <NA>   <NA>
##                                                                                                                     url
## 1 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 2 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 3 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 4 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 5 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 6 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
##   inning_side inning next_ num                   gameday_link event_num
## 1         top      1     Y   3 gid_2018_02_21_asubbc_arimlb_1        24
## 2         top      1     Y   4 gid_2018_02_21_asubbc_arimlb_1        30
## 3         top      2     Y  10 gid_2018_02_21_asubbc_arimlb_1        95
## 4         top      2     Y  11 gid_2018_02_21_asubbc_arimlb_1       101
## 5         top      2     Y  11 gid_2018_02_21_asubbc_arimlb_1       101
## 6         top      2     Y  12 gid_2018_02_21_asubbc_arimlb_1       114

First let’s join atbat and pitch, same as before.

# play_guid makes it only final pitch of each atbat
d2 <- inner_join(dat$pitch, dat$atbat, by=c("num", "gameday_link", "play_guid")) %>% 
  select(-des_es, atbat_des_es, event2_es, url.x, url.y)

Now we are going to join this with separate data frames for runners that start on each base. We’ll have to make changes to the runner df before doing so.

df1b <- dat$runner %>% filter(start=="1B") %>% mutate(on_1b=id, score_on_1b=score, on_1b_end=end) %>% 
  select(on_1b, score_on_1b, on_1b_end, num, gameday_link)
inner_join(d2, df1b, c("num", "gameday_link", "on_1b")) %>% head
##               des  id type    tfs             tfs_zulu      x
## 1 In play, no out  98    X 203038 2018-02-21T20:30:38Z  77.66
## 2 In play, no out 110    X 203257 2018-02-21T20:32:57Z 133.97
## 3 In play, run(s) 120    X 203439 2018-02-21T20:34:39Z 106.42
## 4 In play, out(s) 135    X 203703 2018-02-21T20:37:03Z 120.92
## 5            Ball 152    B 204115 2018-02-21T20:41:15Z 175.29
## 6 In play, out(s) 173    X 204714 2018-02-21T20:47:14Z 126.95
##                      y         sv_id start_speed end_speed   sz_top
## 1 2018-02-21T20:30:38Z 180221_203038        79.9      72.4 3.714936
## 2 2018-02-21T20:32:57Z 180221_203257        87.1      79.2 3.779927
## 3 2018-02-21T20:34:39Z 180221_203439        74.4      67.5 3.530531
## 4 2018-02-21T20:37:03Z 180221_203703        88.1      79.5 3.548632
## 5 2018-02-21T20:41:15Z 180221_204115        88.1      80.7 3.503302
## 6 2018-02-21T20:47:14Z 180221_204714        92.6      84.8 3.779927
##     sz_bot     pfx_x     pfx_z         px       pz        x0 y0       z0
## 1 1.441594 11.338361  1.209846  1.0320467 1.966430  2.737342 50 4.705749
## 2 1.472084  8.886973  5.845162 -0.4449158 2.056990  2.595478 50 4.397702
## 3 1.348389 -7.928245 -2.987633  0.2747256 1.696184  2.874654 50 4.642148
## 4 1.357439  9.389828  4.680869 -0.1026762 1.985936  2.739222 50 4.329306
## 5 1.669040 10.924321  5.512149 -1.5652272 1.039354  2.488585 50 4.168845
## 6 1.472084 -6.602327  2.831185 -0.2616575 1.972476 -2.707562 50 4.873596
##          vx0       vy0        vz0        ax       ay        az break_y
## 1  -7.245731 -115.8948  0.4451926  15.30967 23.12209 -30.54045    23.8
## 2 -10.470069 -126.2775 -1.2914091  14.38623 25.10273 -22.71190    23.8
## 3  -3.336862 -108.0486  2.1486741  -9.26001 20.93519 -35.66354    23.8
## 4 -10.221188 -127.8784 -1.0378124  15.44161 28.06494 -24.47634    23.7
## 5 -13.804698 -127.4231 -3.3198851  18.06576 24.75311 -23.05850    23.8
## 6   8.769351 -134.6005 -2.6182950 -12.08440 29.84615 -26.99206    23.8
##   break_angle break_length pitch_type type_confidence zone nasty spin_dir
## 1       -24.6         10.1         CH           0.884   14    49   96.091
## 2       -28.3          6.5         CH           0.645    7    34  123.334
## 3        15.5         12.7         CU           0.887    9    24  290.649
## 4       -28.2          6.9         FT           0.793    8    32  116.497
## 5       -33.2          6.9         FT           0.671   13    26  116.775
## 6        20.4          6.5         FT           0.925    7    40  246.789
##   spin_rate cc mt
## 1  1927.359      
## 2  1978.267      
## 3  1328.706      
## 4  1957.459      
## 5  2303.891      
## 6  1417.211      
##                                                                                                                   url.x
## 1 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 2 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 3 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 4 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 5 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 6 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
##   inning_side.x inning.x next_.x num  on_1b  on_2b  on_3b count
## 1           top        2       Y  11 656637     NA     NA   0-1
## 2           top        2       Y  12 679523 656637     NA   3-2
## 3           top        2       Y  13 676422 679523 656637   0-2
## 4           top        2       Y  14 676525 676422 679523   3-2
## 5           top        2       Y  15 679522 676525 676422   3-2
## 6           top        2       Y  17 679521 679522 676525   2-1
##                     gameday_link code event_num.x
## 1 gid_2018_02_21_asubbc_arimlb_1    D          98
## 2 gid_2018_02_21_asubbc_arimlb_1    D         110
## 3 gid_2018_02_21_asubbc_arimlb_1    E         120
## 4 gid_2018_02_21_asubbc_arimlb_1    X         135
## 5 gid_2018_02_21_asubbc_arimlb_1    B         152
## 6 gid_2018_02_21_asubbc_arimlb_1    X         173
##                              play_guid pitcher batter b s o start_tfs
## 1 5cc74538-37ee-43be-828f-f0a3aca56eca  608355 679523 0 1 0    203017
## 2 a58ed393-cccc-4d90-b17e-ed95326a1026  608355 676422 3 2 0    203114
## 3 5f30e09c-d8be-4fdd-861f-6077b15d82d8  608355 676525 0 2 0    203335
## 4 ef690705-10d4-4693-bfc5-4666da217703  608355 679522 3 2 1    203515
## 5 1eb020a2-ad04-4455-93fa-4ab6149e6360  608355 679521 4 2 1    203828
## 6 3d85b855-a0d8-43d7-a8fb-cf6162f37ec2  524349 669398 2 1 3    204555
##         start_tfs_zulu stand b_height p_throws
## 1 2018-02-21T20:30:17Z     R      6-4        L
## 2 2018-02-21T20:31:14Z     L      6-4        L
## 3 2018-02-21T20:33:35Z     R     5-10        L
## 4 2018-02-21T20:35:15Z     L     5-11        L
## 5 2018-02-21T20:38:28Z     R      6-2        L
## 6 2018-02-21T20:45:55Z     L      6-4        R
##                                                                                                                                                                                            atbat_des
## 1                                                                                                      Luke Leisenring singles on a line drive to left fielder Ramon Flores.   Taylor Lane to 2nd.  
## 2                                                                           Zach Hogueisson singles on a line drive to left fielder Ramon Flores.   Taylor Lane to 3rd.    Luke Leisenring to 2nd.  
## 3                                                    Scott Mehan singles on a line drive to left fielder Ramon Flores.   Taylor Lane scores.    Luke Leisenring to 3rd.    Zach Hogueisson to 2nd.  
## 4 Hunter Jump grounds into a force out, shortstop Kristopher Negron to catcher Josh Thole.   Luke Leisenring out at home.    Zach Hogueisson to 3rd.    Scott Mehan to 2nd.    Hunter Jump to 1st.  
## 5                                                                                                      Myles Denson walks.   Zach Hogueisson scores.    Scott Mehan to 3rd.    Hunter Jump to 2nd.  
## 6                                                                                                                                      Gage Workman grounds out to first baseman Christian Walker.  
##                                                                                                                                                                                                     atbat_des_es
## 1                                                                                                             Luke Leisenring pega sencillo con línea a jardinero izquierdo Ramon Flores.   Taylor Lane a 2da.  
## 2                                                                                   Zach Hogueisson pega sencillo con línea a jardinero izquierdo Ramon Flores.   Taylor Lane a 3ra.    Luke Leisenring a 2da.  
## 3                                                                Scott Mehan pega sencillo con línea a jardinero izquierdo Ramon Flores.   Taylor Lane anota  Luke Leisenring a 3ra.    Zach Hogueisson a 2da.  
## 4 Hunter Jump batea rodado batea para out forzado, campo corto Kristopher Negron a receptor Josh Thole.   Luke Leisenring a cabo a home.    Zach Hogueisson a 3ra.    Scott Mehan a 2da.    Hunter Jump a 1ra.  
## 5                                                                                                        Myles Denson recibe base por bolas.   Zach Hogueisson anota  Scott Mehan a 3ra.    Hunter Jump a 2da.  
## 6                                                                                                                                            Gage Workman batea rodado de out a primera base Christian Walker.  
##       event home_team_runs away_team_runs
## 1    Single              2              0
## 2    Single              2              0
## 3    Single              2              1
## 4  Forceout              2              1
## 5      Walk              2              2
## 6 Groundout              2              2
##                                                                                                                   url.y
## 1 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 2 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 3 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 4 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 5 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 6 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
##   inning_side.y inning.y next_.y event2 event3 batter_name   pitcher_name
## 1           top        2       Y   <NA>   <NA>        <NA>           <NA>
## 2           top        2       Y   <NA>   <NA>        <NA>           <NA>
## 3           top        2       Y   <NA>   <NA>        <NA>           <NA>
## 4           top        2       Y   <NA>   <NA>        <NA>           <NA>
## 5           top        2       Y   <NA>   <NA>        <NA>           <NA>
## 6           top        2       Y   <NA>   <NA>        <NA> Andury Acevedo
##   event4       date         end_tfs_zulu event_num.y        event_es
## 1   <NA> 2018-02-21 2018-02-21T20:30:51Z         101        Sencillo
## 2   <NA> 2018-02-21 2018-02-21T20:33:15Z         114        Sencillo
## 3   <NA> 2018-02-21 2018-02-21T20:34:50Z         126        Sencillo
## 4   <NA> 2018-02-21 2018-02-21T20:37:16Z         140     Out Forzado
## 5   <NA> 2018-02-21 2018-02-21T20:41:20Z         158  Base por Bolas
## 6   <NA> 2018-02-21 2018-02-21T20:47:21Z         175 Roletazo de Out
##   event2_es score_on_1b on_1b_end
## 1      <NA>        <NA>        2B
## 2      <NA>        <NA>        2B
## 3      <NA>        <NA>        2B
## 4      <NA>        <NA>        2B
## 5      <NA>        <NA>        2B
## 6      <NA>        <NA>

Looks okay, let’s do all bases.

df2b <- dat$runner %>% filter(start=="2B") %>% mutate(on_2b=id, score_on_2b=score, on_2b_end=end) %>% 
  select(on_2b, score_on_2b, on_2b_end, num, gameday_link)
df3b <- dat$runner %>% filter(start=="3B") %>% mutate(on_3b=id, score_on_3b=score, on_3b_end=end) %>% 
  select(on_3b, score_on_3b, on_3b_end, num, gameday_link)
d2r1 <- left_join(d2, df1b, c("num", "gameday_link", "on_1b"))
d2r12 <- left_join(d2r1, df2b, c("num", "gameday_link", "on_2b"))
d2r123 <- left_join(d2r12, df3b, c("num", "gameday_link", "on_3b"))

I’m getting something weird where rows are duplicated and I don’t think they should be. I’ll remove these.

d3 <- d2r123[!duplicated(d2r123),]
d3 %>% head
##               des  id type    tfs             tfs_zulu      x
## 1 In play, out(s)   8    X 201104 2018-02-21T20:11:04Z 154.65
## 2 In play, out(s)  15    X 201233 2018-02-21T20:12:33Z  93.63
## 3            Ball  22    B 201359 2018-02-21T20:13:59Z 120.98
## 4 In play, no out  93    X 202949 2018-02-21T20:29:49Z  75.08
## 5 In play, no out  98    X 203038 2018-02-21T20:30:38Z  77.66
## 6 In play, no out 110    X 203257 2018-02-21T20:32:57Z 133.97
##                      y         sv_id start_speed end_speed   sz_top
## 1 2018-02-21T20:11:04Z 180221_201104        93.9      84.7 3.691824
## 2 2018-02-21T20:12:33Z 180221_201233        93.6      85.1 3.777274
## 3 2018-02-21T20:13:59Z 180221_201359        91.8      82.1 3.730258
## 4 2018-02-21T20:29:49Z 180221_202949        87.3      78.4 3.774226
## 5 2018-02-21T20:30:38Z 180221_203038        79.9      72.4 3.714936
## 6 2018-02-21T20:32:57Z 180221_203257        87.1      79.2 3.779927
##     sz_bot      pfx_x     pfx_z         px       pz        x0 y0       z0
## 1 1.429535 -5.6703425 10.369535 -0.9877447 2.142914 -1.501145 50 5.937560
## 2 1.470100 -4.2230981 10.697804  0.6130764 3.402739 -1.303427 50 6.094880
## 3 1.657475 -0.8684963 10.447090 -0.1045714 5.299412 -1.396811 50 6.300174
## 4 1.467724  9.6957339  6.188114  1.0996425 2.613556  2.786671 50 4.543479
## 5 1.441594 11.3383615  1.209846  1.0320467 1.966430  2.737342 50 4.705749
## 6 1.472084  8.8869725  5.845162 -0.4449158 2.056990  2.595478 50 4.397702
##          vx0       vy0         vz0         ax       ay        az break_y
## 1   3.344477 -136.4020 -7.76176468 -10.550289 33.48019 -12.88040    23.7
## 2   6.609214 -135.9674 -4.94373382  -7.884881 30.54500 -12.20034    23.8
## 3   3.689997 -133.4538 -0.06339038  -1.554547 30.72924 -13.47450    23.7
## 4  -7.319277 -126.6819 -0.32336228  15.537905 29.22415 -22.25728    23.7
## 5  -7.245731 -115.8948  0.44519255  15.309674 23.12209 -30.54045    23.8
## 6 -10.470069 -126.2775 -1.29140914  14.386231 25.10273 -22.71190    23.8
##   break_angle break_length pitch_type type_confidence zone nasty spin_dir
## 1        33.5          3.9         FF           0.893   13    49  208.670
## 2        25.8          3.2         FF           0.925    3    46  201.541
## 3         3.0          3.1         FF           0.924   11    19  184.750
## 4       -31.9          6.7         FT           0.575   14    50  122.548
## 5       -24.6         10.1         CH           0.884   14    49   96.091
## 6       -28.3          6.5         CH           0.645    7    34  123.334
##   spin_rate cc mt
## 1  2338.852      
## 2  2291.256      
## 3  2039.843      
## 4  2110.954      
## 5  1927.359      
## 6  1978.267      
##                                                                                                                   url.x
## 1 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 2 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 3 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 4 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 5 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 6 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
##   inning_side.x inning.x next_.x num  on_1b  on_2b on_3b count
## 1           top        1       Y   1     NA     NA    NA   3-2
## 2           top        1       Y   2     NA     NA    NA   2-1
## 3           top        1       Y   3     NA     NA    NA   3-0
## 4           top        2       Y  10     NA     NA    NA   1-0
## 5           top        2       Y  11 656637     NA    NA   0-1
## 6           top        2       Y  12 679523 656637    NA   3-2
##                     gameday_link code event_num.x
## 1 gid_2018_02_21_asubbc_arimlb_1    X           8
## 2 gid_2018_02_21_asubbc_arimlb_1    X          15
## 3 gid_2018_02_21_asubbc_arimlb_1    B          22
## 4 gid_2018_02_21_asubbc_arimlb_1    D          93
## 5 gid_2018_02_21_asubbc_arimlb_1    D          98
## 6 gid_2018_02_21_asubbc_arimlb_1    D         110
##                              play_guid pitcher batter b s o start_tfs
## 1 622eacfe-c74e-4157-b3b0-d1fd0ca579fc  664199 675961 3 2 1    200935
## 2 c346ed75-356c-4231-a4a7-0f929573894f  664199 669398 2 1 2    201134
## 3 88dabc5d-5cef-4966-af15-ced53ce18e29  664199 666183 4 0 2    201305
## 4 f37cc792-decd-41fa-a3dc-3f1be9b06d26  608355 656637 1 0 0    202936
## 5 5cc74538-37ee-43be-828f-f0a3aca56eca  608355 679523 0 1 0    203017
## 6 a58ed393-cccc-4d90-b17e-ed95326a1026  608355 676422 3 2 0    203114
##         start_tfs_zulu stand b_height p_throws
## 1 2018-02-21T20:09:35Z     R      6-2        R
## 2 2018-02-21T20:11:34Z     L      6-4        R
## 3 2018-02-21T20:13:05Z     L      6-5        R
## 4 2018-02-21T20:29:36Z     R      6-4        L
## 5 2018-02-21T20:30:17Z     R      6-4        L
## 6 2018-02-21T20:31:14Z     L      6-4        L
##                                                                                                                  atbat_des
## 1                                                             Alika Williams pops out to second baseman Ildemaro Vargas.  
## 2                      Gage Workman grounds out softly, third baseman Jack Reinheimer to first baseman Christian Walker.  
## 3                                                                                                   Hunter Bishop walks.  
## 4                                                Taylor Lane singles on a line drive to right fielder Jeremy Hazelbaker.  
## 5                            Luke Leisenring singles on a line drive to left fielder Ramon Flores.   Taylor Lane to 2nd.  
## 6 Zach Hogueisson singles on a line drive to left fielder Ramon Flores.   Taylor Lane to 3rd.    Luke Leisenring to 2nd.  
##                                                                                                                   atbat_des_es
## 1                                                      Alika Williams batea elevadito de out a segunda base Ildemaro Vargas.  
## 2                 Gage Workman batea rodado de out suavemente, tercera base Jack Reinheimer a primera base Christian Walker.  
## 3                                                                                       Hunter Bishop recibe base por bolas.  
## 4                                                 Taylor Lane pega sencillo con línea a jardinero derecho Jeremy Hazelbaker.  
## 5                           Luke Leisenring pega sencillo con línea a jardinero izquierdo Ramon Flores.   Taylor Lane a 2da.  
## 6 Zach Hogueisson pega sencillo con línea a jardinero izquierdo Ramon Flores.   Taylor Lane a 3ra.    Luke Leisenring a 2da.  
##       event home_team_runs away_team_runs
## 1   Pop Out              0              0
## 2 Groundout              0              0
## 3      Walk              0              0
## 4    Single              2              0
## 5    Single              2              0
## 6    Single              2              0
##                                                                                                                   url.y
## 1 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 2 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 3 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 4 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 5 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 6 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
##   inning_side.y inning.y next_.y event2 event3 batter_name pitcher_name
## 1           top        1       Y   <NA>   <NA>        <NA>         <NA>
## 2           top        1       Y   <NA>   <NA>        <NA>         <NA>
## 3           top        1       Y   <NA>   <NA>        <NA>         <NA>
## 4           top        2       Y   <NA>   <NA>        <NA>         <NA>
## 5           top        2       Y   <NA>   <NA>        <NA>         <NA>
## 6           top        2       Y   <NA>   <NA>        <NA>         <NA>
##   event4       date         end_tfs_zulu event_num.y        event_es
## 1   <NA> 2018-02-21 2018-02-21T20:11:14Z          10  Elevado de Out
## 2   <NA> 2018-02-21 2018-02-21T20:12:40Z          17 Roletazo de Out
## 3   <NA> 2018-02-21 2018-02-21T20:14:02Z          24  Base por Bolas
## 4   <NA> 2018-02-21 2018-02-21T20:29:59Z          95        Sencillo
## 5   <NA> 2018-02-21 2018-02-21T20:30:51Z         101        Sencillo
## 6   <NA> 2018-02-21 2018-02-21T20:33:15Z         114        Sencillo
##   event2_es score_on_1b on_1b_end score_on_2b on_2b_end score_on_3b
## 1      <NA>        <NA>      <NA>        <NA>      <NA>        <NA>
## 2      <NA>        <NA>      <NA>        <NA>      <NA>        <NA>
## 3      <NA>        <NA>      <NA>        <NA>      <NA>        <NA>
## 4      <NA>        <NA>      <NA>        <NA>      <NA>        <NA>
## 5      <NA>        <NA>        2B        <NA>      <NA>        <NA>
## 6      <NA>        <NA>        2B        <NA>        3B        <NA>
##   on_3b_end
## 1      <NA>
## 2      <NA>
## 3      <NA>
## 4      <NA>
## 5      <NA>
## 6      <NA>

To make sure d3 makes sense, check number of batters in each inning.

d3 %>% group_by(inning.x, inning_side.x, gameday_link) %>% summarize(N=n()) %>% with(table(N))
## N
##     1     2     3     4     5     6     7     8     9    10    11    12 
##    26    38 17747 13359  8724  4547  2894  1635  1272   739   500   413 
##    13    14    15    16    17    18    19    20    21    22    23    24 
##   244   204   165   105    92    63    47    36    22    19    12    13 
##    25    26    27    28    29    30    31    32    33    34    38 
##     7     6     4     3     2     2     2     3     2     1     1

The one with 38 batters in the inning is an error. But for the most part these numbers look right, so I’m not going to waste time on it.

Now we need to find the number of runs scored from before each play until the end of the inning. I’m not sure this is getting everything right, like if a run is scored on the last play of the inning.

runsatendofinning <- d3 %>% group_by(inning.x, inning_side.x, gameday_link) %>% summarize(runsatend=as.numeric(if(inning_side.x[1]=="top") {max(away_team_runs)} else {max(home_team_runs)}))
runsatendofinning %>% head
## # A tibble: 6 x 4
## # Groups:   inning.x, inning_side.x [1]
##   inning.x inning_side.x gameday_link                   runsatend
##      <dbl> <chr>         <chr>                              <dbl>
## 1        1 bottom        gid_2018_02_21_asubbc_arimlb_1         2
## 2        1 bottom        gid_2018_02_22_bocbbc_bosmlb_2         2
## 3        1 bottom        gid_2018_02_22_flsbbc_detmlb_1         0
## 4        1 bottom        gid_2018_02_22_neubbc_bosmlb_1         7
## 5        1 bottom        gid_2018_02_22_umgbbc_minmlb_1         1
## 6        1 bottom        gid_2018_02_22_utabbc_phimlb_1         1
runsatendofinning$runsatend %>% table
## .
##     0     1     2     3     4     5     6     7     8     9    10    11 
## 14891  9607  7808  6001  4624  3296  2358  1450  1170   938   260   194 
##    12    13    14    15    16    17    18    19    20    21    22    24 
##   115    77    49    43    15    16     9    15     5     4     2     1 
##    25 
##     1

I don’t think 25 runs were actually scored in an inning, this is likely an error I should fix.

Now we join this with the previous data.

d4 <- inner_join(d3, runsatendofinning, c("inning.x", "inning_side.x", "gameday_link"))
d4 <- d4 %>% 
  mutate(runsfromthispointtoendofinning=runsatend - (ifelse(inning_side.x=="top", as.numeric(away_team_runs), as.numeric(home_team_runs))))
d4 %>% head
##               des  id type    tfs             tfs_zulu      x
## 1 In play, out(s)   8    X 201104 2018-02-21T20:11:04Z 154.65
## 2 In play, out(s)  15    X 201233 2018-02-21T20:12:33Z  93.63
## 3            Ball  22    B 201359 2018-02-21T20:13:59Z 120.98
## 4 In play, no out  93    X 202949 2018-02-21T20:29:49Z  75.08
## 5 In play, no out  98    X 203038 2018-02-21T20:30:38Z  77.66
## 6 In play, no out 110    X 203257 2018-02-21T20:32:57Z 133.97
##                      y         sv_id start_speed end_speed   sz_top
## 1 2018-02-21T20:11:04Z 180221_201104        93.9      84.7 3.691824
## 2 2018-02-21T20:12:33Z 180221_201233        93.6      85.1 3.777274
## 3 2018-02-21T20:13:59Z 180221_201359        91.8      82.1 3.730258
## 4 2018-02-21T20:29:49Z 180221_202949        87.3      78.4 3.774226
## 5 2018-02-21T20:30:38Z 180221_203038        79.9      72.4 3.714936
## 6 2018-02-21T20:32:57Z 180221_203257        87.1      79.2 3.779927
##     sz_bot      pfx_x     pfx_z         px       pz        x0 y0       z0
## 1 1.429535 -5.6703425 10.369535 -0.9877447 2.142914 -1.501145 50 5.937560
## 2 1.470100 -4.2230981 10.697804  0.6130764 3.402739 -1.303427 50 6.094880
## 3 1.657475 -0.8684963 10.447090 -0.1045714 5.299412 -1.396811 50 6.300174
## 4 1.467724  9.6957339  6.188114  1.0996425 2.613556  2.786671 50 4.543479
## 5 1.441594 11.3383615  1.209846  1.0320467 1.966430  2.737342 50 4.705749
## 6 1.472084  8.8869725  5.845162 -0.4449158 2.056990  2.595478 50 4.397702
##          vx0       vy0         vz0         ax       ay        az break_y
## 1   3.344477 -136.4020 -7.76176468 -10.550289 33.48019 -12.88040    23.7
## 2   6.609214 -135.9674 -4.94373382  -7.884881 30.54500 -12.20034    23.8
## 3   3.689997 -133.4538 -0.06339038  -1.554547 30.72924 -13.47450    23.7
## 4  -7.319277 -126.6819 -0.32336228  15.537905 29.22415 -22.25728    23.7
## 5  -7.245731 -115.8948  0.44519255  15.309674 23.12209 -30.54045    23.8
## 6 -10.470069 -126.2775 -1.29140914  14.386231 25.10273 -22.71190    23.8
##   break_angle break_length pitch_type type_confidence zone nasty spin_dir
## 1        33.5          3.9         FF           0.893   13    49  208.670
## 2        25.8          3.2         FF           0.925    3    46  201.541
## 3         3.0          3.1         FF           0.924   11    19  184.750
## 4       -31.9          6.7         FT           0.575   14    50  122.548
## 5       -24.6         10.1         CH           0.884   14    49   96.091
## 6       -28.3          6.5         CH           0.645    7    34  123.334
##   spin_rate cc mt
## 1  2338.852      
## 2  2291.256      
## 3  2039.843      
## 4  2110.954      
## 5  1927.359      
## 6  1978.267      
##                                                                                                                   url.x
## 1 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 2 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 3 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 4 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 5 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 6 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
##   inning_side.x inning.x next_.x num  on_1b  on_2b on_3b count
## 1           top        1       Y   1     NA     NA    NA   3-2
## 2           top        1       Y   2     NA     NA    NA   2-1
## 3           top        1       Y   3     NA     NA    NA   3-0
## 4           top        2       Y  10     NA     NA    NA   1-0
## 5           top        2       Y  11 656637     NA    NA   0-1
## 6           top        2       Y  12 679523 656637    NA   3-2
##                     gameday_link code event_num.x
## 1 gid_2018_02_21_asubbc_arimlb_1    X           8
## 2 gid_2018_02_21_asubbc_arimlb_1    X          15
## 3 gid_2018_02_21_asubbc_arimlb_1    B          22
## 4 gid_2018_02_21_asubbc_arimlb_1    D          93
## 5 gid_2018_02_21_asubbc_arimlb_1    D          98
## 6 gid_2018_02_21_asubbc_arimlb_1    D         110
##                              play_guid pitcher batter b s o start_tfs
## 1 622eacfe-c74e-4157-b3b0-d1fd0ca579fc  664199 675961 3 2 1    200935
## 2 c346ed75-356c-4231-a4a7-0f929573894f  664199 669398 2 1 2    201134
## 3 88dabc5d-5cef-4966-af15-ced53ce18e29  664199 666183 4 0 2    201305
## 4 f37cc792-decd-41fa-a3dc-3f1be9b06d26  608355 656637 1 0 0    202936
## 5 5cc74538-37ee-43be-828f-f0a3aca56eca  608355 679523 0 1 0    203017
## 6 a58ed393-cccc-4d90-b17e-ed95326a1026  608355 676422 3 2 0    203114
##         start_tfs_zulu stand b_height p_throws
## 1 2018-02-21T20:09:35Z     R      6-2        R
## 2 2018-02-21T20:11:34Z     L      6-4        R
## 3 2018-02-21T20:13:05Z     L      6-5        R
## 4 2018-02-21T20:29:36Z     R      6-4        L
## 5 2018-02-21T20:30:17Z     R      6-4        L
## 6 2018-02-21T20:31:14Z     L      6-4        L
##                                                                                                                  atbat_des
## 1                                                             Alika Williams pops out to second baseman Ildemaro Vargas.  
## 2                      Gage Workman grounds out softly, third baseman Jack Reinheimer to first baseman Christian Walker.  
## 3                                                                                                   Hunter Bishop walks.  
## 4                                                Taylor Lane singles on a line drive to right fielder Jeremy Hazelbaker.  
## 5                            Luke Leisenring singles on a line drive to left fielder Ramon Flores.   Taylor Lane to 2nd.  
## 6 Zach Hogueisson singles on a line drive to left fielder Ramon Flores.   Taylor Lane to 3rd.    Luke Leisenring to 2nd.  
##                                                                                                                   atbat_des_es
## 1                                                      Alika Williams batea elevadito de out a segunda base Ildemaro Vargas.  
## 2                 Gage Workman batea rodado de out suavemente, tercera base Jack Reinheimer a primera base Christian Walker.  
## 3                                                                                       Hunter Bishop recibe base por bolas.  
## 4                                                 Taylor Lane pega sencillo con línea a jardinero derecho Jeremy Hazelbaker.  
## 5                           Luke Leisenring pega sencillo con línea a jardinero izquierdo Ramon Flores.   Taylor Lane a 2da.  
## 6 Zach Hogueisson pega sencillo con línea a jardinero izquierdo Ramon Flores.   Taylor Lane a 3ra.    Luke Leisenring a 2da.  
##       event home_team_runs away_team_runs
## 1   Pop Out              0              0
## 2 Groundout              0              0
## 3      Walk              0              0
## 4    Single              2              0
## 5    Single              2              0
## 6    Single              2              0
##                                                                                                                   url.y
## 1 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 2 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 3 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 4 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 5 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 6 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
##   inning_side.y inning.y next_.y event2 event3 batter_name pitcher_name
## 1           top        1       Y   <NA>   <NA>        <NA>         <NA>
## 2           top        1       Y   <NA>   <NA>        <NA>         <NA>
## 3           top        1       Y   <NA>   <NA>        <NA>         <NA>
## 4           top        2       Y   <NA>   <NA>        <NA>         <NA>
## 5           top        2       Y   <NA>   <NA>        <NA>         <NA>
## 6           top        2       Y   <NA>   <NA>        <NA>         <NA>
##   event4       date         end_tfs_zulu event_num.y        event_es
## 1   <NA> 2018-02-21 2018-02-21T20:11:14Z          10  Elevado de Out
## 2   <NA> 2018-02-21 2018-02-21T20:12:40Z          17 Roletazo de Out
## 3   <NA> 2018-02-21 2018-02-21T20:14:02Z          24  Base por Bolas
## 4   <NA> 2018-02-21 2018-02-21T20:29:59Z          95        Sencillo
## 5   <NA> 2018-02-21 2018-02-21T20:30:51Z         101        Sencillo
## 6   <NA> 2018-02-21 2018-02-21T20:33:15Z         114        Sencillo
##   event2_es score_on_1b on_1b_end score_on_2b on_2b_end score_on_3b
## 1      <NA>        <NA>      <NA>        <NA>      <NA>        <NA>
## 2      <NA>        <NA>      <NA>        <NA>      <NA>        <NA>
## 3      <NA>        <NA>      <NA>        <NA>      <NA>        <NA>
## 4      <NA>        <NA>      <NA>        <NA>      <NA>        <NA>
## 5      <NA>        <NA>        2B        <NA>      <NA>        <NA>
## 6      <NA>        <NA>        2B        <NA>        3B        <NA>
##   on_3b_end runsatend runsfromthispointtoendofinning
## 1      <NA>         0                              0
## 2      <NA>         0                              0
## 3      <NA>         0                              0
## 4      <NA>         2                              2
## 5      <NA>         2                              2
## 6      <NA>         2                              2
d4$runsfromthispointtoendofinning %>% table
## .
##     -8     -7     -6     -5     -4     -3     -2     -1      0      1 
##      2     17     29     85    209    321    536    551 205506  25044 
##      2      3      4      5      6      7      8      9 
##  13407   6545   3207   1162    391    153     36     11

There’s clearly some issue here since I’m getting that negative runs can be scored.

d4 %>% filter(runsfromthispointtoendofinning>=0) %>% 
  mutate(someone_on_1b=!is.na(on_1b), someone_on_2b=!is.na(on_2b), someone_on_3b=!is.na(on_3b)) %>% 
  group_by(someone_on_1b, someone_on_2b, someone_on_3b, o) %>% summarize(meanruns=mean(runsfromthispointtoendofinning))
## # A tibble: 32 x 5
## # Groups:   someone_on_1b, someone_on_2b, someone_on_3b [8]
##    someone_on_1b someone_on_2b someone_on_3b     o meanruns
##    <lgl>         <lgl>         <lgl>         <dbl>    <dbl>
##  1 FALSE         FALSE         FALSE             0    0.861
##  2 FALSE         FALSE         FALSE             1    0.335
##  3 FALSE         FALSE         FALSE             2    0.137
##  4 FALSE         FALSE         FALSE             3    0    
##  5 FALSE         FALSE         TRUE              0    1.24 
##  6 FALSE         FALSE         TRUE              1    0.843
##  7 FALSE         FALSE         TRUE              2    0.265
##  8 FALSE         FALSE         TRUE              3    0    
##  9 FALSE         TRUE          FALSE             0    1.28 
## 10 FALSE         TRUE          FALSE             1    0.791
## # ... with 22 more rows

These numbers are wrong. You can check the true values here on FanGraphs. I think the problem is with the data. The runs and outs seem to be from the end of each play, when I want the beginning. And the negative runs is a big concern that I didn’t fix.

Why I’m failing

The runs and outs are shown after each event, but the base runners shown are from before the event. We want the state before each at bat, or after each at bat, but they mixed between the two.

d4$runbattingteam <- ifelse(d4$inning_side.x=="top", d4$away_team_runs, d4$home_team_runs)
d4 %>% filter(event=="Home Run") %>% with(table(as.numeric(runbattingteam)))
## 
##    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15 
## 1884 1667 1344 1157  893  643  473  342  291  162  101   58   48   27   11 
##   16   17   18   19   20   21   24 
##   13    3    4    2    3    3    1
d4 %>% filter(event=="Groundout") %>% with(table(as.numeric(o)))
## 
##     1     2     3 
## 14262 11853 14003

Below we see that there are baserunners on home run plays, meaning the base runners are from before the play.

d4 %>% filter(event=="Home Run") %>% with(table(!is.na(on_1b)))
## 
## FALSE  TRUE 
##  6488  2642

But I can use on_xb_end?

d4 %>% filter(runsfromthispointtoendofinning>=0) %>% 
  mutate(someone_on_1b_end=!is.na(on_1b_end), someone_on_2b_end=!is.na(on_2b_end), someone_on_3b_end=!is.na(on_3b_end)) %>% 
  group_by(someone_on_1b_end, someone_on_2b_end, someone_on_3b_end, o) %>% summarize(meanruns=mean(runsfromthispointtoendofinning))
## # A tibble: 32 x 5
## # Groups:   someone_on_1b_end, someone_on_2b_end, someone_on_3b_end [8]
##    someone_on_1b_end someone_on_2b_end someone_on_3b_end     o meanruns
##    <lgl>             <lgl>             <lgl>             <dbl>    <dbl>
##  1 FALSE             FALSE             FALSE                 0    0.888
##  2 FALSE             FALSE             FALSE                 1    0.417
##  3 FALSE             FALSE             FALSE                 2    0.207
##  4 FALSE             FALSE             FALSE                 3    0    
##  5 FALSE             FALSE             TRUE                  0    0.840
##  6 FALSE             FALSE             TRUE                  1    0.524
##  7 FALSE             FALSE             TRUE                  2    0.220
##  8 FALSE             FALSE             TRUE                  3    0    
##  9 FALSE             TRUE              FALSE                 0    1.21 
## 10 FALSE             TRUE              FALSE                 1    0.865
## # ... with 22 more rows

No, this is terrible wrong. on_1b_end is not saying that someone is on first at the end of the play, it is saying where the player that started on first ended up.

d4$on_1b_end %>% table
## .
##          2B    3B 
## 32445 20330  5162

It doesn’t even say “1B” if they don’t leave the base? That’s not helpful.

d4$score_on_1b %>% table
## .
##    T 
## 4517

Can I just calculate where they all end up?

d4 %>% filter(runsfromthispointtoendofinning>=0) %>% 
    mutate(someone_on_1b_end=event=="Single" | (is.na(on_1b_end) & is.na(score_on_1b)),
           someone_on_2b_end=event=="Double" | (is.na(on_2b_end) & is.na(score_on_2b)) | (!is.na(on_1b_end) & on_1b_end=="2B"),
           someone_on_3b_end=event=="Triple" | (is.na(on_3b_end) & is.na(score_on_3b)) | (!is.na(on_2b_end) & on_2b_end=="3B") | (!is.na(on_1b_end) & on_1b_end=="3B")
    ) %>% 
    group_by(someone_on_1b_end, someone_on_2b_end, someone_on_3b_end, o) %>% summarize(meanruns=mean(runsfromthispointtoendofinning), N=n()) 
## # A tibble: 32 x 6
## # Groups:   someone_on_1b_end, someone_on_2b_end, someone_on_3b_end [8]
##    someone_on_1b_end someone_on_2b_e~ someone_on_3b_e~     o meanruns     N
##    <lgl>             <lgl>            <lgl>            <dbl>    <dbl> <int>
##  1 FALSE             FALSE            FALSE                0   0.0526    19
##  2 FALSE             FALSE            FALSE                1   0.121     58
##  3 FALSE             FALSE            FALSE                2   0.0957   115
##  4 FALSE             FALSE            FALSE                3   0       2132
##  5 FALSE             FALSE            TRUE                 0   0.833    144
##  6 FALSE             FALSE            TRUE                 1   0.683    480
##  7 FALSE             FALSE            TRUE                 2   0.338   1058
##  8 FALSE             FALSE            TRUE                 3   0       5545
##  9 FALSE             TRUE             FALSE                0   0.908     87
## 10 FALSE             TRUE             FALSE                1   0.639    360
## # ... with 22 more rows

No this is a mess and terribly wrong.

Conclusion

I failed to recreate expected runs based on the state (number of outs and positions of baserunners). I got numbers that were off by a significant amount, but seemingly correlated with the true values. The data has the baserunners from before the play, but the outs and runs from after the play, making the data very hard to use. It also doesn’t say where the batter ended up on base. Even with this data, I should have been able to figure it out, but it’s not worth the effort now.