A state in a baseball game is defined by the number outs and which bases have baserunners. We could include the count, inning, etc, but we’re starting with the simplest version. For a given state, there is an expected number of runs a team would score in the remainder of that inning. These tables are readily available online. I want to try to recreate the tables using data from mlbgameday package. It will require joining some tables and may be challenging.
library(mlbgameday)
library(magrittr)
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Get data.
# Takes hours to get all the data for the year
dat <- get_payload(start = "2018-01-01", end = "2018-12-31")
The data for each at bat doesn’t include where the bases runners start and end, the pitch data has where the runners start
dat$pitch %>% head
## des des_es id type tfs tfs_zulu
## 1 Ball Bola mala 3 B 200949 2018-02-21T20:09:49Z
## 2 Ball Bola mala 4 B 201003 2018-02-21T20:10:03Z
## 3 Ball Bola mala 5 B 201017 2018-02-21T20:10:17Z
## 4 Called Strike Strike cantado 6 S 201033 2018-02-21T20:10:33Z
## 5 Swinging Strike Strike tirándole 7 S 201047 2018-02-21T20:10:47Z
## 6 In play, out(s) En juego, out(s) 8 X 201104 2018-02-21T20:11:04Z
## x y sv_id start_speed end_speed sz_top
## 1 71.59 2018-02-21T20:09:49Z 180221_200949 91.0 81.3 3.506856
## 2 103.27 2018-02-21T20:10:03Z 180221_201003 91.2 82.0 3.568236
## 3 91.60 2018-02-21T20:10:17Z 180221_201017 90.6 80.9 3.472593
## 4 119.20 2018-02-21T20:10:33Z 180221_201033 91.5 81.9 3.479696
## 5 87.61 2018-02-21T20:10:47Z 180221_201047 92.4 82.1 3.691824
## 6 154.65 2018-02-21T20:11:04Z 180221_201104 93.9 84.7 3.691824
## sz_bot pfx_x pfx_z px pz x0 y0 z0
## 1 1.657627 -1.1335109 12.205505 1.1915198 4.022879 -1.357341 50 6.134092
## 2 1.603569 -1.1255583 11.926857 0.3602895 3.753873 -1.448721 50 6.055660
## 3 1.539733 -1.4003827 12.924600 0.6666075 3.220051 -1.387051 50 6.057518
## 4 1.551208 -5.4802387 9.984281 -0.0577412 2.803866 -1.541164 50 5.965382
## 5 1.429535 -0.6695584 12.734720 0.7708471 3.111694 -1.411845 50 6.019451
## 6 1.429535 -5.6703425 10.369535 -0.9877447 2.142914 -1.501145 50 5.937560
## vx0 vy0 vz0 ax ay az break_y
## 1 6.998528 -132.1595 -3.391877 -1.980667 31.32489 -10.846471 23.7
## 2 5.103719 -132.6797 -3.885679 -1.990105 30.53986 -11.086111 23.7
## 3 5.773281 -131.4931 -5.428116 -2.422168 31.03215 -9.819048 23.7
## 4 5.726707 -132.8034 -5.467920 -9.682306 31.28585 -14.534148 23.7
## 5 5.962360 -134.0980 -5.849144 -1.197000 33.92179 -9.407615 23.7
## 6 3.344477 -136.4020 -7.761765 -10.550289 33.48019 -12.880400 23.7
## break_angle break_length pitch_type type_confidence zone nasty spin_dir
## 1 1.7 2.7 FF 0.911 12 58 185.304
## 2 4.0 2.8 FF 0.915 12 46 185.389
## 3 5.6 2.6 FF 0.893 3 49 186.182
## 4 27.9 4.1 FF 0.904 5 20 208.761
## 5 -1.5 2.5 FF 0.917 12 48 183.008
## 6 33.5 3.9 FF 0.893 13 49 208.670
## spin_rate cc mt
## 1 2351.306
## 2 2316.095
## 3 2480.889
## 4 2198.229
## 5 2466.456
## 6 2338.852
## url
## 1 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 2 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 3 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 4 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 5 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 6 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## inning_side inning next_ num on_1b on_2b on_3b count
## 1 top 1 Y 1 NA NA NA 0-0
## 2 top 1 Y 1 NA NA NA 1-0
## 3 top 1 Y 1 NA NA NA 2-0
## 4 top 1 Y 1 NA NA NA 3-0
## 5 top 1 Y 1 NA NA NA 3-1
## 6 top 1 Y 1 NA NA NA 3-2
## gameday_link code event_num
## 1 gid_2018_02_21_asubbc_arimlb_1 B 3
## 2 gid_2018_02_21_asubbc_arimlb_1 B 4
## 3 gid_2018_02_21_asubbc_arimlb_1 B 5
## 4 gid_2018_02_21_asubbc_arimlb_1 C 6
## 5 gid_2018_02_21_asubbc_arimlb_1 S 7
## 6 gid_2018_02_21_asubbc_arimlb_1 X 8
## play_guid
## 1 ae276dd4-116a-4c66-b9f6-652959a7ed25
## 2 e9c938f2-bd89-455e-852b-e7a143c029df
## 3 b92cc24e-2657-4e23-ae9b-611aee8b3d0d
## 4 259ce2d0-0d67-45ac-8c7d-cd56d9d65a9f
## 5 10f9568e-cb4e-4dc9-83af-61833eb14da1
## 6 622eacfe-c74e-4157-b3b0-d1fd0ca579fc
We’re going to have to use runner to get the progress of base runners.
dat$runner %>% head
## id start end event score rbi earned
## 1 666183 1B Walk <NA> <NA> <NA>
## 2 666183 1B Pickoff 1B <NA> <NA> <NA>
## 3 656637 1B Single <NA> <NA> <NA>
## 4 656637 1B 2B Single <NA> <NA> <NA>
## 5 679523 1B Single <NA> <NA> <NA>
## 6 679523 1B 2B Single <NA> <NA> <NA>
## url
## 1 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 2 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 3 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 4 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 5 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 6 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## inning_side inning next_ num gameday_link event_num
## 1 top 1 Y 3 gid_2018_02_21_asubbc_arimlb_1 24
## 2 top 1 Y 4 gid_2018_02_21_asubbc_arimlb_1 30
## 3 top 2 Y 10 gid_2018_02_21_asubbc_arimlb_1 95
## 4 top 2 Y 11 gid_2018_02_21_asubbc_arimlb_1 101
## 5 top 2 Y 11 gid_2018_02_21_asubbc_arimlb_1 101
## 6 top 2 Y 12 gid_2018_02_21_asubbc_arimlb_1 114
First let’s join atbat and pitch, same as before.
# play_guid makes it only final pitch of each atbat
d2 <- inner_join(dat$pitch, dat$atbat, by=c("num", "gameday_link", "play_guid")) %>%
select(-des_es, atbat_des_es, event2_es, url.x, url.y)
Now we are going to join this with separate data frames for runners that start on each base. We’ll have to make changes to the runner df before doing so.
df1b <- dat$runner %>% filter(start=="1B") %>% mutate(on_1b=id, score_on_1b=score, on_1b_end=end) %>%
select(on_1b, score_on_1b, on_1b_end, num, gameday_link)
inner_join(d2, df1b, c("num", "gameday_link", "on_1b")) %>% head
## des id type tfs tfs_zulu x
## 1 In play, no out 98 X 203038 2018-02-21T20:30:38Z 77.66
## 2 In play, no out 110 X 203257 2018-02-21T20:32:57Z 133.97
## 3 In play, run(s) 120 X 203439 2018-02-21T20:34:39Z 106.42
## 4 In play, out(s) 135 X 203703 2018-02-21T20:37:03Z 120.92
## 5 Ball 152 B 204115 2018-02-21T20:41:15Z 175.29
## 6 In play, out(s) 173 X 204714 2018-02-21T20:47:14Z 126.95
## y sv_id start_speed end_speed sz_top
## 1 2018-02-21T20:30:38Z 180221_203038 79.9 72.4 3.714936
## 2 2018-02-21T20:32:57Z 180221_203257 87.1 79.2 3.779927
## 3 2018-02-21T20:34:39Z 180221_203439 74.4 67.5 3.530531
## 4 2018-02-21T20:37:03Z 180221_203703 88.1 79.5 3.548632
## 5 2018-02-21T20:41:15Z 180221_204115 88.1 80.7 3.503302
## 6 2018-02-21T20:47:14Z 180221_204714 92.6 84.8 3.779927
## sz_bot pfx_x pfx_z px pz x0 y0 z0
## 1 1.441594 11.338361 1.209846 1.0320467 1.966430 2.737342 50 4.705749
## 2 1.472084 8.886973 5.845162 -0.4449158 2.056990 2.595478 50 4.397702
## 3 1.348389 -7.928245 -2.987633 0.2747256 1.696184 2.874654 50 4.642148
## 4 1.357439 9.389828 4.680869 -0.1026762 1.985936 2.739222 50 4.329306
## 5 1.669040 10.924321 5.512149 -1.5652272 1.039354 2.488585 50 4.168845
## 6 1.472084 -6.602327 2.831185 -0.2616575 1.972476 -2.707562 50 4.873596
## vx0 vy0 vz0 ax ay az break_y
## 1 -7.245731 -115.8948 0.4451926 15.30967 23.12209 -30.54045 23.8
## 2 -10.470069 -126.2775 -1.2914091 14.38623 25.10273 -22.71190 23.8
## 3 -3.336862 -108.0486 2.1486741 -9.26001 20.93519 -35.66354 23.8
## 4 -10.221188 -127.8784 -1.0378124 15.44161 28.06494 -24.47634 23.7
## 5 -13.804698 -127.4231 -3.3198851 18.06576 24.75311 -23.05850 23.8
## 6 8.769351 -134.6005 -2.6182950 -12.08440 29.84615 -26.99206 23.8
## break_angle break_length pitch_type type_confidence zone nasty spin_dir
## 1 -24.6 10.1 CH 0.884 14 49 96.091
## 2 -28.3 6.5 CH 0.645 7 34 123.334
## 3 15.5 12.7 CU 0.887 9 24 290.649
## 4 -28.2 6.9 FT 0.793 8 32 116.497
## 5 -33.2 6.9 FT 0.671 13 26 116.775
## 6 20.4 6.5 FT 0.925 7 40 246.789
## spin_rate cc mt
## 1 1927.359
## 2 1978.267
## 3 1328.706
## 4 1957.459
## 5 2303.891
## 6 1417.211
## url.x
## 1 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 2 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 3 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 4 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 5 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 6 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## inning_side.x inning.x next_.x num on_1b on_2b on_3b count
## 1 top 2 Y 11 656637 NA NA 0-1
## 2 top 2 Y 12 679523 656637 NA 3-2
## 3 top 2 Y 13 676422 679523 656637 0-2
## 4 top 2 Y 14 676525 676422 679523 3-2
## 5 top 2 Y 15 679522 676525 676422 3-2
## 6 top 2 Y 17 679521 679522 676525 2-1
## gameday_link code event_num.x
## 1 gid_2018_02_21_asubbc_arimlb_1 D 98
## 2 gid_2018_02_21_asubbc_arimlb_1 D 110
## 3 gid_2018_02_21_asubbc_arimlb_1 E 120
## 4 gid_2018_02_21_asubbc_arimlb_1 X 135
## 5 gid_2018_02_21_asubbc_arimlb_1 B 152
## 6 gid_2018_02_21_asubbc_arimlb_1 X 173
## play_guid pitcher batter b s o start_tfs
## 1 5cc74538-37ee-43be-828f-f0a3aca56eca 608355 679523 0 1 0 203017
## 2 a58ed393-cccc-4d90-b17e-ed95326a1026 608355 676422 3 2 0 203114
## 3 5f30e09c-d8be-4fdd-861f-6077b15d82d8 608355 676525 0 2 0 203335
## 4 ef690705-10d4-4693-bfc5-4666da217703 608355 679522 3 2 1 203515
## 5 1eb020a2-ad04-4455-93fa-4ab6149e6360 608355 679521 4 2 1 203828
## 6 3d85b855-a0d8-43d7-a8fb-cf6162f37ec2 524349 669398 2 1 3 204555
## start_tfs_zulu stand b_height p_throws
## 1 2018-02-21T20:30:17Z R 6-4 L
## 2 2018-02-21T20:31:14Z L 6-4 L
## 3 2018-02-21T20:33:35Z R 5-10 L
## 4 2018-02-21T20:35:15Z L 5-11 L
## 5 2018-02-21T20:38:28Z R 6-2 L
## 6 2018-02-21T20:45:55Z L 6-4 R
## atbat_des
## 1 Luke Leisenring singles on a line drive to left fielder Ramon Flores. Taylor Lane to 2nd.
## 2 Zach Hogueisson singles on a line drive to left fielder Ramon Flores. Taylor Lane to 3rd. Luke Leisenring to 2nd.
## 3 Scott Mehan singles on a line drive to left fielder Ramon Flores. Taylor Lane scores. Luke Leisenring to 3rd. Zach Hogueisson to 2nd.
## 4 Hunter Jump grounds into a force out, shortstop Kristopher Negron to catcher Josh Thole. Luke Leisenring out at home. Zach Hogueisson to 3rd. Scott Mehan to 2nd. Hunter Jump to 1st.
## 5 Myles Denson walks. Zach Hogueisson scores. Scott Mehan to 3rd. Hunter Jump to 2nd.
## 6 Gage Workman grounds out to first baseman Christian Walker.
## atbat_des_es
## 1 Luke Leisenring pega sencillo con línea a jardinero izquierdo Ramon Flores. Taylor Lane a 2da.
## 2 Zach Hogueisson pega sencillo con línea a jardinero izquierdo Ramon Flores. Taylor Lane a 3ra. Luke Leisenring a 2da.
## 3 Scott Mehan pega sencillo con línea a jardinero izquierdo Ramon Flores. Taylor Lane anota Luke Leisenring a 3ra. Zach Hogueisson a 2da.
## 4 Hunter Jump batea rodado batea para out forzado, campo corto Kristopher Negron a receptor Josh Thole. Luke Leisenring a cabo a home. Zach Hogueisson a 3ra. Scott Mehan a 2da. Hunter Jump a 1ra.
## 5 Myles Denson recibe base por bolas. Zach Hogueisson anota Scott Mehan a 3ra. Hunter Jump a 2da.
## 6 Gage Workman batea rodado de out a primera base Christian Walker.
## event home_team_runs away_team_runs
## 1 Single 2 0
## 2 Single 2 0
## 3 Single 2 1
## 4 Forceout 2 1
## 5 Walk 2 2
## 6 Groundout 2 2
## url.y
## 1 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 2 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 3 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 4 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 5 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 6 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## inning_side.y inning.y next_.y event2 event3 batter_name pitcher_name
## 1 top 2 Y <NA> <NA> <NA> <NA>
## 2 top 2 Y <NA> <NA> <NA> <NA>
## 3 top 2 Y <NA> <NA> <NA> <NA>
## 4 top 2 Y <NA> <NA> <NA> <NA>
## 5 top 2 Y <NA> <NA> <NA> <NA>
## 6 top 2 Y <NA> <NA> <NA> Andury Acevedo
## event4 date end_tfs_zulu event_num.y event_es
## 1 <NA> 2018-02-21 2018-02-21T20:30:51Z 101 Sencillo
## 2 <NA> 2018-02-21 2018-02-21T20:33:15Z 114 Sencillo
## 3 <NA> 2018-02-21 2018-02-21T20:34:50Z 126 Sencillo
## 4 <NA> 2018-02-21 2018-02-21T20:37:16Z 140 Out Forzado
## 5 <NA> 2018-02-21 2018-02-21T20:41:20Z 158 Base por Bolas
## 6 <NA> 2018-02-21 2018-02-21T20:47:21Z 175 Roletazo de Out
## event2_es score_on_1b on_1b_end
## 1 <NA> <NA> 2B
## 2 <NA> <NA> 2B
## 3 <NA> <NA> 2B
## 4 <NA> <NA> 2B
## 5 <NA> <NA> 2B
## 6 <NA> <NA>
Looks okay, let’s do all bases.
df2b <- dat$runner %>% filter(start=="2B") %>% mutate(on_2b=id, score_on_2b=score, on_2b_end=end) %>%
select(on_2b, score_on_2b, on_2b_end, num, gameday_link)
df3b <- dat$runner %>% filter(start=="3B") %>% mutate(on_3b=id, score_on_3b=score, on_3b_end=end) %>%
select(on_3b, score_on_3b, on_3b_end, num, gameday_link)
d2r1 <- left_join(d2, df1b, c("num", "gameday_link", "on_1b"))
d2r12 <- left_join(d2r1, df2b, c("num", "gameday_link", "on_2b"))
d2r123 <- left_join(d2r12, df3b, c("num", "gameday_link", "on_3b"))
I’m getting something weird where rows are duplicated and I don’t think they should be. I’ll remove these.
d3 <- d2r123[!duplicated(d2r123),]
d3 %>% head
## des id type tfs tfs_zulu x
## 1 In play, out(s) 8 X 201104 2018-02-21T20:11:04Z 154.65
## 2 In play, out(s) 15 X 201233 2018-02-21T20:12:33Z 93.63
## 3 Ball 22 B 201359 2018-02-21T20:13:59Z 120.98
## 4 In play, no out 93 X 202949 2018-02-21T20:29:49Z 75.08
## 5 In play, no out 98 X 203038 2018-02-21T20:30:38Z 77.66
## 6 In play, no out 110 X 203257 2018-02-21T20:32:57Z 133.97
## y sv_id start_speed end_speed sz_top
## 1 2018-02-21T20:11:04Z 180221_201104 93.9 84.7 3.691824
## 2 2018-02-21T20:12:33Z 180221_201233 93.6 85.1 3.777274
## 3 2018-02-21T20:13:59Z 180221_201359 91.8 82.1 3.730258
## 4 2018-02-21T20:29:49Z 180221_202949 87.3 78.4 3.774226
## 5 2018-02-21T20:30:38Z 180221_203038 79.9 72.4 3.714936
## 6 2018-02-21T20:32:57Z 180221_203257 87.1 79.2 3.779927
## sz_bot pfx_x pfx_z px pz x0 y0 z0
## 1 1.429535 -5.6703425 10.369535 -0.9877447 2.142914 -1.501145 50 5.937560
## 2 1.470100 -4.2230981 10.697804 0.6130764 3.402739 -1.303427 50 6.094880
## 3 1.657475 -0.8684963 10.447090 -0.1045714 5.299412 -1.396811 50 6.300174
## 4 1.467724 9.6957339 6.188114 1.0996425 2.613556 2.786671 50 4.543479
## 5 1.441594 11.3383615 1.209846 1.0320467 1.966430 2.737342 50 4.705749
## 6 1.472084 8.8869725 5.845162 -0.4449158 2.056990 2.595478 50 4.397702
## vx0 vy0 vz0 ax ay az break_y
## 1 3.344477 -136.4020 -7.76176468 -10.550289 33.48019 -12.88040 23.7
## 2 6.609214 -135.9674 -4.94373382 -7.884881 30.54500 -12.20034 23.8
## 3 3.689997 -133.4538 -0.06339038 -1.554547 30.72924 -13.47450 23.7
## 4 -7.319277 -126.6819 -0.32336228 15.537905 29.22415 -22.25728 23.7
## 5 -7.245731 -115.8948 0.44519255 15.309674 23.12209 -30.54045 23.8
## 6 -10.470069 -126.2775 -1.29140914 14.386231 25.10273 -22.71190 23.8
## break_angle break_length pitch_type type_confidence zone nasty spin_dir
## 1 33.5 3.9 FF 0.893 13 49 208.670
## 2 25.8 3.2 FF 0.925 3 46 201.541
## 3 3.0 3.1 FF 0.924 11 19 184.750
## 4 -31.9 6.7 FT 0.575 14 50 122.548
## 5 -24.6 10.1 CH 0.884 14 49 96.091
## 6 -28.3 6.5 CH 0.645 7 34 123.334
## spin_rate cc mt
## 1 2338.852
## 2 2291.256
## 3 2039.843
## 4 2110.954
## 5 1927.359
## 6 1978.267
## url.x
## 1 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 2 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 3 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 4 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 5 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 6 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## inning_side.x inning.x next_.x num on_1b on_2b on_3b count
## 1 top 1 Y 1 NA NA NA 3-2
## 2 top 1 Y 2 NA NA NA 2-1
## 3 top 1 Y 3 NA NA NA 3-0
## 4 top 2 Y 10 NA NA NA 1-0
## 5 top 2 Y 11 656637 NA NA 0-1
## 6 top 2 Y 12 679523 656637 NA 3-2
## gameday_link code event_num.x
## 1 gid_2018_02_21_asubbc_arimlb_1 X 8
## 2 gid_2018_02_21_asubbc_arimlb_1 X 15
## 3 gid_2018_02_21_asubbc_arimlb_1 B 22
## 4 gid_2018_02_21_asubbc_arimlb_1 D 93
## 5 gid_2018_02_21_asubbc_arimlb_1 D 98
## 6 gid_2018_02_21_asubbc_arimlb_1 D 110
## play_guid pitcher batter b s o start_tfs
## 1 622eacfe-c74e-4157-b3b0-d1fd0ca579fc 664199 675961 3 2 1 200935
## 2 c346ed75-356c-4231-a4a7-0f929573894f 664199 669398 2 1 2 201134
## 3 88dabc5d-5cef-4966-af15-ced53ce18e29 664199 666183 4 0 2 201305
## 4 f37cc792-decd-41fa-a3dc-3f1be9b06d26 608355 656637 1 0 0 202936
## 5 5cc74538-37ee-43be-828f-f0a3aca56eca 608355 679523 0 1 0 203017
## 6 a58ed393-cccc-4d90-b17e-ed95326a1026 608355 676422 3 2 0 203114
## start_tfs_zulu stand b_height p_throws
## 1 2018-02-21T20:09:35Z R 6-2 R
## 2 2018-02-21T20:11:34Z L 6-4 R
## 3 2018-02-21T20:13:05Z L 6-5 R
## 4 2018-02-21T20:29:36Z R 6-4 L
## 5 2018-02-21T20:30:17Z R 6-4 L
## 6 2018-02-21T20:31:14Z L 6-4 L
## atbat_des
## 1 Alika Williams pops out to second baseman Ildemaro Vargas.
## 2 Gage Workman grounds out softly, third baseman Jack Reinheimer to first baseman Christian Walker.
## 3 Hunter Bishop walks.
## 4 Taylor Lane singles on a line drive to right fielder Jeremy Hazelbaker.
## 5 Luke Leisenring singles on a line drive to left fielder Ramon Flores. Taylor Lane to 2nd.
## 6 Zach Hogueisson singles on a line drive to left fielder Ramon Flores. Taylor Lane to 3rd. Luke Leisenring to 2nd.
## atbat_des_es
## 1 Alika Williams batea elevadito de out a segunda base Ildemaro Vargas.
## 2 Gage Workman batea rodado de out suavemente, tercera base Jack Reinheimer a primera base Christian Walker.
## 3 Hunter Bishop recibe base por bolas.
## 4 Taylor Lane pega sencillo con línea a jardinero derecho Jeremy Hazelbaker.
## 5 Luke Leisenring pega sencillo con línea a jardinero izquierdo Ramon Flores. Taylor Lane a 2da.
## 6 Zach Hogueisson pega sencillo con línea a jardinero izquierdo Ramon Flores. Taylor Lane a 3ra. Luke Leisenring a 2da.
## event home_team_runs away_team_runs
## 1 Pop Out 0 0
## 2 Groundout 0 0
## 3 Walk 0 0
## 4 Single 2 0
## 5 Single 2 0
## 6 Single 2 0
## url.y
## 1 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 2 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 3 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 4 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 5 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 6 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## inning_side.y inning.y next_.y event2 event3 batter_name pitcher_name
## 1 top 1 Y <NA> <NA> <NA> <NA>
## 2 top 1 Y <NA> <NA> <NA> <NA>
## 3 top 1 Y <NA> <NA> <NA> <NA>
## 4 top 2 Y <NA> <NA> <NA> <NA>
## 5 top 2 Y <NA> <NA> <NA> <NA>
## 6 top 2 Y <NA> <NA> <NA> <NA>
## event4 date end_tfs_zulu event_num.y event_es
## 1 <NA> 2018-02-21 2018-02-21T20:11:14Z 10 Elevado de Out
## 2 <NA> 2018-02-21 2018-02-21T20:12:40Z 17 Roletazo de Out
## 3 <NA> 2018-02-21 2018-02-21T20:14:02Z 24 Base por Bolas
## 4 <NA> 2018-02-21 2018-02-21T20:29:59Z 95 Sencillo
## 5 <NA> 2018-02-21 2018-02-21T20:30:51Z 101 Sencillo
## 6 <NA> 2018-02-21 2018-02-21T20:33:15Z 114 Sencillo
## event2_es score_on_1b on_1b_end score_on_2b on_2b_end score_on_3b
## 1 <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> 2B <NA> <NA> <NA>
## 6 <NA> <NA> 2B <NA> 3B <NA>
## on_3b_end
## 1 <NA>
## 2 <NA>
## 3 <NA>
## 4 <NA>
## 5 <NA>
## 6 <NA>
To make sure d3 makes sense, check number of batters in each inning.
d3 %>% group_by(inning.x, inning_side.x, gameday_link) %>% summarize(N=n()) %>% with(table(N))
## N
## 1 2 3 4 5 6 7 8 9 10 11 12
## 26 38 17747 13359 8724 4547 2894 1635 1272 739 500 413
## 13 14 15 16 17 18 19 20 21 22 23 24
## 244 204 165 105 92 63 47 36 22 19 12 13
## 25 26 27 28 29 30 31 32 33 34 38
## 7 6 4 3 2 2 2 3 2 1 1
The one with 38 batters in the inning is an error. But for the most part these numbers look right, so I’m not going to waste time on it.
Now we need to find the number of runs scored from before each play until the end of the inning. I’m not sure this is getting everything right, like if a run is scored on the last play of the inning.
runsatendofinning <- d3 %>% group_by(inning.x, inning_side.x, gameday_link) %>% summarize(runsatend=as.numeric(if(inning_side.x[1]=="top") {max(away_team_runs)} else {max(home_team_runs)}))
runsatendofinning %>% head
## # A tibble: 6 x 4
## # Groups: inning.x, inning_side.x [1]
## inning.x inning_side.x gameday_link runsatend
## <dbl> <chr> <chr> <dbl>
## 1 1 bottom gid_2018_02_21_asubbc_arimlb_1 2
## 2 1 bottom gid_2018_02_22_bocbbc_bosmlb_2 2
## 3 1 bottom gid_2018_02_22_flsbbc_detmlb_1 0
## 4 1 bottom gid_2018_02_22_neubbc_bosmlb_1 7
## 5 1 bottom gid_2018_02_22_umgbbc_minmlb_1 1
## 6 1 bottom gid_2018_02_22_utabbc_phimlb_1 1
runsatendofinning$runsatend %>% table
## .
## 0 1 2 3 4 5 6 7 8 9 10 11
## 14891 9607 7808 6001 4624 3296 2358 1450 1170 938 260 194
## 12 13 14 15 16 17 18 19 20 21 22 24
## 115 77 49 43 15 16 9 15 5 4 2 1
## 25
## 1
I don’t think 25 runs were actually scored in an inning, this is likely an error I should fix.
Now we join this with the previous data.
d4 <- inner_join(d3, runsatendofinning, c("inning.x", "inning_side.x", "gameday_link"))
d4 <- d4 %>%
mutate(runsfromthispointtoendofinning=runsatend - (ifelse(inning_side.x=="top", as.numeric(away_team_runs), as.numeric(home_team_runs))))
d4 %>% head
## des id type tfs tfs_zulu x
## 1 In play, out(s) 8 X 201104 2018-02-21T20:11:04Z 154.65
## 2 In play, out(s) 15 X 201233 2018-02-21T20:12:33Z 93.63
## 3 Ball 22 B 201359 2018-02-21T20:13:59Z 120.98
## 4 In play, no out 93 X 202949 2018-02-21T20:29:49Z 75.08
## 5 In play, no out 98 X 203038 2018-02-21T20:30:38Z 77.66
## 6 In play, no out 110 X 203257 2018-02-21T20:32:57Z 133.97
## y sv_id start_speed end_speed sz_top
## 1 2018-02-21T20:11:04Z 180221_201104 93.9 84.7 3.691824
## 2 2018-02-21T20:12:33Z 180221_201233 93.6 85.1 3.777274
## 3 2018-02-21T20:13:59Z 180221_201359 91.8 82.1 3.730258
## 4 2018-02-21T20:29:49Z 180221_202949 87.3 78.4 3.774226
## 5 2018-02-21T20:30:38Z 180221_203038 79.9 72.4 3.714936
## 6 2018-02-21T20:32:57Z 180221_203257 87.1 79.2 3.779927
## sz_bot pfx_x pfx_z px pz x0 y0 z0
## 1 1.429535 -5.6703425 10.369535 -0.9877447 2.142914 -1.501145 50 5.937560
## 2 1.470100 -4.2230981 10.697804 0.6130764 3.402739 -1.303427 50 6.094880
## 3 1.657475 -0.8684963 10.447090 -0.1045714 5.299412 -1.396811 50 6.300174
## 4 1.467724 9.6957339 6.188114 1.0996425 2.613556 2.786671 50 4.543479
## 5 1.441594 11.3383615 1.209846 1.0320467 1.966430 2.737342 50 4.705749
## 6 1.472084 8.8869725 5.845162 -0.4449158 2.056990 2.595478 50 4.397702
## vx0 vy0 vz0 ax ay az break_y
## 1 3.344477 -136.4020 -7.76176468 -10.550289 33.48019 -12.88040 23.7
## 2 6.609214 -135.9674 -4.94373382 -7.884881 30.54500 -12.20034 23.8
## 3 3.689997 -133.4538 -0.06339038 -1.554547 30.72924 -13.47450 23.7
## 4 -7.319277 -126.6819 -0.32336228 15.537905 29.22415 -22.25728 23.7
## 5 -7.245731 -115.8948 0.44519255 15.309674 23.12209 -30.54045 23.8
## 6 -10.470069 -126.2775 -1.29140914 14.386231 25.10273 -22.71190 23.8
## break_angle break_length pitch_type type_confidence zone nasty spin_dir
## 1 33.5 3.9 FF 0.893 13 49 208.670
## 2 25.8 3.2 FF 0.925 3 46 201.541
## 3 3.0 3.1 FF 0.924 11 19 184.750
## 4 -31.9 6.7 FT 0.575 14 50 122.548
## 5 -24.6 10.1 CH 0.884 14 49 96.091
## 6 -28.3 6.5 CH 0.645 7 34 123.334
## spin_rate cc mt
## 1 2338.852
## 2 2291.256
## 3 2039.843
## 4 2110.954
## 5 1927.359
## 6 1978.267
## url.x
## 1 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 2 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 3 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 4 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 5 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 6 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## inning_side.x inning.x next_.x num on_1b on_2b on_3b count
## 1 top 1 Y 1 NA NA NA 3-2
## 2 top 1 Y 2 NA NA NA 2-1
## 3 top 1 Y 3 NA NA NA 3-0
## 4 top 2 Y 10 NA NA NA 1-0
## 5 top 2 Y 11 656637 NA NA 0-1
## 6 top 2 Y 12 679523 656637 NA 3-2
## gameday_link code event_num.x
## 1 gid_2018_02_21_asubbc_arimlb_1 X 8
## 2 gid_2018_02_21_asubbc_arimlb_1 X 15
## 3 gid_2018_02_21_asubbc_arimlb_1 B 22
## 4 gid_2018_02_21_asubbc_arimlb_1 D 93
## 5 gid_2018_02_21_asubbc_arimlb_1 D 98
## 6 gid_2018_02_21_asubbc_arimlb_1 D 110
## play_guid pitcher batter b s o start_tfs
## 1 622eacfe-c74e-4157-b3b0-d1fd0ca579fc 664199 675961 3 2 1 200935
## 2 c346ed75-356c-4231-a4a7-0f929573894f 664199 669398 2 1 2 201134
## 3 88dabc5d-5cef-4966-af15-ced53ce18e29 664199 666183 4 0 2 201305
## 4 f37cc792-decd-41fa-a3dc-3f1be9b06d26 608355 656637 1 0 0 202936
## 5 5cc74538-37ee-43be-828f-f0a3aca56eca 608355 679523 0 1 0 203017
## 6 a58ed393-cccc-4d90-b17e-ed95326a1026 608355 676422 3 2 0 203114
## start_tfs_zulu stand b_height p_throws
## 1 2018-02-21T20:09:35Z R 6-2 R
## 2 2018-02-21T20:11:34Z L 6-4 R
## 3 2018-02-21T20:13:05Z L 6-5 R
## 4 2018-02-21T20:29:36Z R 6-4 L
## 5 2018-02-21T20:30:17Z R 6-4 L
## 6 2018-02-21T20:31:14Z L 6-4 L
## atbat_des
## 1 Alika Williams pops out to second baseman Ildemaro Vargas.
## 2 Gage Workman grounds out softly, third baseman Jack Reinheimer to first baseman Christian Walker.
## 3 Hunter Bishop walks.
## 4 Taylor Lane singles on a line drive to right fielder Jeremy Hazelbaker.
## 5 Luke Leisenring singles on a line drive to left fielder Ramon Flores. Taylor Lane to 2nd.
## 6 Zach Hogueisson singles on a line drive to left fielder Ramon Flores. Taylor Lane to 3rd. Luke Leisenring to 2nd.
## atbat_des_es
## 1 Alika Williams batea elevadito de out a segunda base Ildemaro Vargas.
## 2 Gage Workman batea rodado de out suavemente, tercera base Jack Reinheimer a primera base Christian Walker.
## 3 Hunter Bishop recibe base por bolas.
## 4 Taylor Lane pega sencillo con línea a jardinero derecho Jeremy Hazelbaker.
## 5 Luke Leisenring pega sencillo con línea a jardinero izquierdo Ramon Flores. Taylor Lane a 2da.
## 6 Zach Hogueisson pega sencillo con línea a jardinero izquierdo Ramon Flores. Taylor Lane a 3ra. Luke Leisenring a 2da.
## event home_team_runs away_team_runs
## 1 Pop Out 0 0
## 2 Groundout 0 0
## 3 Walk 0 0
## 4 Single 2 0
## 5 Single 2 0
## 6 Single 2 0
## url.y
## 1 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 2 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 3 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 4 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 5 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## 6 http://gd2.mlb.com/components/game/mlb/year_2018/month_02/day_21/gid_2018_02_21_asubbc_arimlb_1/inning/inning_all.xml
## inning_side.y inning.y next_.y event2 event3 batter_name pitcher_name
## 1 top 1 Y <NA> <NA> <NA> <NA>
## 2 top 1 Y <NA> <NA> <NA> <NA>
## 3 top 1 Y <NA> <NA> <NA> <NA>
## 4 top 2 Y <NA> <NA> <NA> <NA>
## 5 top 2 Y <NA> <NA> <NA> <NA>
## 6 top 2 Y <NA> <NA> <NA> <NA>
## event4 date end_tfs_zulu event_num.y event_es
## 1 <NA> 2018-02-21 2018-02-21T20:11:14Z 10 Elevado de Out
## 2 <NA> 2018-02-21 2018-02-21T20:12:40Z 17 Roletazo de Out
## 3 <NA> 2018-02-21 2018-02-21T20:14:02Z 24 Base por Bolas
## 4 <NA> 2018-02-21 2018-02-21T20:29:59Z 95 Sencillo
## 5 <NA> 2018-02-21 2018-02-21T20:30:51Z 101 Sencillo
## 6 <NA> 2018-02-21 2018-02-21T20:33:15Z 114 Sencillo
## event2_es score_on_1b on_1b_end score_on_2b on_2b_end score_on_3b
## 1 <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> 2B <NA> <NA> <NA>
## 6 <NA> <NA> 2B <NA> 3B <NA>
## on_3b_end runsatend runsfromthispointtoendofinning
## 1 <NA> 0 0
## 2 <NA> 0 0
## 3 <NA> 0 0
## 4 <NA> 2 2
## 5 <NA> 2 2
## 6 <NA> 2 2
d4$runsfromthispointtoendofinning %>% table
## .
## -8 -7 -6 -5 -4 -3 -2 -1 0 1
## 2 17 29 85 209 321 536 551 205506 25044
## 2 3 4 5 6 7 8 9
## 13407 6545 3207 1162 391 153 36 11
There’s clearly some issue here since I’m getting that negative runs can be scored.
d4 %>% filter(runsfromthispointtoendofinning>=0) %>%
mutate(someone_on_1b=!is.na(on_1b), someone_on_2b=!is.na(on_2b), someone_on_3b=!is.na(on_3b)) %>%
group_by(someone_on_1b, someone_on_2b, someone_on_3b, o) %>% summarize(meanruns=mean(runsfromthispointtoendofinning))
## # A tibble: 32 x 5
## # Groups: someone_on_1b, someone_on_2b, someone_on_3b [8]
## someone_on_1b someone_on_2b someone_on_3b o meanruns
## <lgl> <lgl> <lgl> <dbl> <dbl>
## 1 FALSE FALSE FALSE 0 0.861
## 2 FALSE FALSE FALSE 1 0.335
## 3 FALSE FALSE FALSE 2 0.137
## 4 FALSE FALSE FALSE 3 0
## 5 FALSE FALSE TRUE 0 1.24
## 6 FALSE FALSE TRUE 1 0.843
## 7 FALSE FALSE TRUE 2 0.265
## 8 FALSE FALSE TRUE 3 0
## 9 FALSE TRUE FALSE 0 1.28
## 10 FALSE TRUE FALSE 1 0.791
## # ... with 22 more rows
These numbers are wrong. You can check the true values here on FanGraphs. I think the problem is with the data. The runs and outs seem to be from the end of each play, when I want the beginning. And the negative runs is a big concern that I didn’t fix.
Why I’m failing
The runs and outs are shown after each event, but the base runners shown are from before the event. We want the state before each at bat, or after each at bat, but they mixed between the two.
d4$runbattingteam <- ifelse(d4$inning_side.x=="top", d4$away_team_runs, d4$home_team_runs)
d4 %>% filter(event=="Home Run") %>% with(table(as.numeric(runbattingteam)))
##
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
## 1884 1667 1344 1157 893 643 473 342 291 162 101 58 48 27 11
## 16 17 18 19 20 21 24
## 13 3 4 2 3 3 1
d4 %>% filter(event=="Groundout") %>% with(table(as.numeric(o)))
##
## 1 2 3
## 14262 11853 14003
Below we see that there are baserunners on home run plays, meaning the base runners are from before the play.
d4 %>% filter(event=="Home Run") %>% with(table(!is.na(on_1b)))
##
## FALSE TRUE
## 6488 2642
But I can use on_xb_end?
d4 %>% filter(runsfromthispointtoendofinning>=0) %>%
mutate(someone_on_1b_end=!is.na(on_1b_end), someone_on_2b_end=!is.na(on_2b_end), someone_on_3b_end=!is.na(on_3b_end)) %>%
group_by(someone_on_1b_end, someone_on_2b_end, someone_on_3b_end, o) %>% summarize(meanruns=mean(runsfromthispointtoendofinning))
## # A tibble: 32 x 5
## # Groups: someone_on_1b_end, someone_on_2b_end, someone_on_3b_end [8]
## someone_on_1b_end someone_on_2b_end someone_on_3b_end o meanruns
## <lgl> <lgl> <lgl> <dbl> <dbl>
## 1 FALSE FALSE FALSE 0 0.888
## 2 FALSE FALSE FALSE 1 0.417
## 3 FALSE FALSE FALSE 2 0.207
## 4 FALSE FALSE FALSE 3 0
## 5 FALSE FALSE TRUE 0 0.840
## 6 FALSE FALSE TRUE 1 0.524
## 7 FALSE FALSE TRUE 2 0.220
## 8 FALSE FALSE TRUE 3 0
## 9 FALSE TRUE FALSE 0 1.21
## 10 FALSE TRUE FALSE 1 0.865
## # ... with 22 more rows
No, this is terrible wrong.
on_1b_end
is not saying that someone is on first at the end of the play,
it is saying where the player that started on first ended up.
d4$on_1b_end %>% table
## .
## 2B 3B
## 32445 20330 5162
It doesn’t even say “1B” if they don’t leave the base? That’s not helpful.
d4$score_on_1b %>% table
## .
## T
## 4517
Can I just calculate where they all end up?
d4 %>% filter(runsfromthispointtoendofinning>=0) %>%
mutate(someone_on_1b_end=event=="Single" | (is.na(on_1b_end) & is.na(score_on_1b)),
someone_on_2b_end=event=="Double" | (is.na(on_2b_end) & is.na(score_on_2b)) | (!is.na(on_1b_end) & on_1b_end=="2B"),
someone_on_3b_end=event=="Triple" | (is.na(on_3b_end) & is.na(score_on_3b)) | (!is.na(on_2b_end) & on_2b_end=="3B") | (!is.na(on_1b_end) & on_1b_end=="3B")
) %>%
group_by(someone_on_1b_end, someone_on_2b_end, someone_on_3b_end, o) %>% summarize(meanruns=mean(runsfromthispointtoendofinning), N=n())
## # A tibble: 32 x 6
## # Groups: someone_on_1b_end, someone_on_2b_end, someone_on_3b_end [8]
## someone_on_1b_end someone_on_2b_e~ someone_on_3b_e~ o meanruns N
## <lgl> <lgl> <lgl> <dbl> <dbl> <int>
## 1 FALSE FALSE FALSE 0 0.0526 19
## 2 FALSE FALSE FALSE 1 0.121 58
## 3 FALSE FALSE FALSE 2 0.0957 115
## 4 FALSE FALSE FALSE 3 0 2132
## 5 FALSE FALSE TRUE 0 0.833 144
## 6 FALSE FALSE TRUE 1 0.683 480
## 7 FALSE FALSE TRUE 2 0.338 1058
## 8 FALSE FALSE TRUE 3 0 5545
## 9 FALSE TRUE FALSE 0 0.908 87
## 10 FALSE TRUE FALSE 1 0.639 360
## # ... with 22 more rows
No this is a mess and terribly wrong.
Conclusion
I failed to recreate expected runs based on the state (number of outs and positions of baserunners). I got numbers that were off by a significant amount, but seemingly correlated with the true values. The data has the baserunners from before the play, but the outs and runs from after the play, making the data very hard to use. It also doesn’t say where the batter ended up on base. Even with this data, I should have been able to figure it out, but it’s not worth the effort now.