MLB Pitchers Analysis Program [BETA]
Extracting data from Pitchers for MLB Expert System. In this case I have focused on Max Scherzer, these are the ESPN numbers:
Max Scherzer
+---------+-------+-------+-------+------+---+----+------------+-----+
| Results | P_Fis | P_Int | P_Emo | Runs | H | BB | Date | Vs |
+---------+-------+-------+-------+------+---+----+------------+-----+
| W | 98.95 | 15.5 | 50.0 | 2 | 4 | 2 | 2017-04-07 | PHI |
| L | 69.92 | 59.46 | 4.95 | 3 | 4 | 2 | 2017-04-12 | STL |
| W | 2.89 | 98.59 | 18.83 | 0 | 2 | 3 | 2017-04-18 | ATL |
| W | 24.02 | 87.79 | 71.69 | 3 | 5 | 1 | 2017-04-23 | NYM |
| L | 86.54 | 45.25 | 100.0 | 5 | 9 | 1 | 2017-04-28 | NYM |
| W | 81.55 | 2.75 | 61.13 | 1 | 2 | 2 | 2017-05-04 | ARI |
| L | 18.45 | 9.27 | 10.91 | 2 | 4 | 2 | 2017-05-09 | BAL |
| W | 5.61 | 50.0 | 4.95 | 3 | 9 | 0 | 2017-05-14 | PHI |
| L | 75.98 | 95.48 | 61.13 | 3 | 4 | 3 | 2017-05-20 | ATL |
| W | 90.85 | 87.79 | 100.0 | 1 | 3 | 2 | 2017-05-26 | SD |
| W | 30.08 | 45.25 | 71.69 | 1 | 5 | 0 | 2017-05-31 | SF |
| W | 5.61 | 2.75 | 10.91 | 1 | 3 | 2 | 2017-06-06 | LAD |
| L | 63.49 | 9.27 | 4.95 | 3 | 3 | 1 | 2017-06-11 | TEX |
| W | 99.88 | 50.0 | 50.0 | 1 | 4 | 2 | 2017-06-16 | NYM |
| L | 56.81 | 90.73 | 95.05 | 2 | 2 | 1 | 2017-06-21 | MIA |
+---------+-------+-------+-------+------+---+----+------------+-----+
This table shows all the games of Max Scherzer, with which team played and how many races he scored among other things, but what is important is the Bio-Rhythm my added, it is interesting now to see as now in Baseball that seems a simpler game Now we realize how complicated it is to add the variable Bio-Rhythm, Even now we should have a value of the other team to be able to analyze a little more deeply the Bio-Rhythm of this Player for each game ....
Now we need more data but even this data we can find them Scraping or looking in the trash hahahaha, ESPN scrapping !!!
Analyzing a little we realize that the NYM team has fit 9 races of the 20 that allowed us to fix it in the following table:
+---------+-------+-------+-------+------+---+----+------------+-----+
| Results | P_Fis | P_Int | P_Emo | Runs | H | BB | Date | Vs |
+---------+-------+-------+-------+------+---+----+------------+-----+
| W | 24.02 | 87.79 | 71.69 | 3 | 5 | 1 | 2017-04-23 | NYM |
| L | 86.54 | 45.25 | 100.0 | 5 | 9 | 1 | 2017-04-28 | NYM |
| W | 99.88 | 50.0 | 50.0 | 1 | 4 | 2 | 2017-06-16 | NYM |
+---------+-------+-------+-------+------+---+----+------------+-----+
We could say that the Euclidean Distance between Intellectual and Emotional, according to my analysis is the most important in the pitcher and plays an important role but it would only be superficial to talk like this.
We must also analyze the wear and tear in which the races have been done, that is to say each game is an analysis and we will do it God in the next days.
We can observe how we could see if the Physical, Intellectual and emotional points generate a Triangle and what type of Triangle can be this would give us some interesting data, I will take it into account in the Graphs.
Now in the following there is another panorama, where we must take into account how the defensive was that day what they call Fit or help of the defensive to the Pitcher.
Here we will learn Sabermetric Friend if or if, if we get into the graphics we will learn something.
+---------+-------+-------+-------+------+---+----+------------+-----+
| Results | P_Fis | P_Int | P_Emo | Runs | H | BB | Date | Vs |
+---------+-------+-------+-------+------+---+----+------------+-----+
| W | 98.95 | 15.5 | 50.0 | 2 | 4 | 2 | 2017-04-07 | PHI |
| W | 5.61 | 50.0 | 4.95 | 3 | 9 | 0 | 2017-05-14 | PHI |
+---------+-------+-------+-------+------+---+----+------------+-----+
Well then I leave the code, with change the id of the player will get the table of the pitcher they want ...
May God bless you until soon: D
Python code:
# -*- coding: utf-8 -*-
import urllib2
from bs4 import BeautifulSoup
import re
from prettytable import PrettyTable
from datetime import date, datetime
import math
url_mlb_lanzador_ficha = "http://www.espn.com/mlb/player/gamelog/_/id/"
lista_lanzadores = [url_mlb_lanzador_ficha+"28976"]
for url_lanzador in lista_lanzadores:
list_game = []
page = urllib2.urlopen(url_lanzador)
soup = BeautifulSoup(page, "lxml")
datos_lanzador = {}
name = soup.find('h1')
nombre = name.text
datos_lanzador["nombre"] = nombre
name_box = soup.find('ul', attrs={'class': 'player-metadata floatleft'})
name_box_general = soup.find('ul', attrs={'class': 'general-info'})
results = []
iterator = 0
for row in name_box:
row = str(row)
find_span = row.find("</span>")
find_li = row.find("</li>")
if iterator == 0:
Birth_Date = row[find_span+7:find_li]
Birth_Date = Birth_Date[0:Birth_Date.find("(")]
datos_lanzador["Birth_Date"]= Birth_Date
if iterator == 1:
Birthplace = row[find_span+7:find_li]
datos_lanzador["Birthplace"]= Birthplace
if iterator == 2:
Experience = row[find_span+7:find_li]
datos_lanzador["Experience"]= Experience
if iterator == 3:
College = row[find_span+7:find_li]
datos_lanzador["College"]= College
iterator += 1
name_box_general1 = str(name_box_general)
find_span = name_box_general1.find('class="first">')
find_li = name_box_general1.find("</li>")
numero = name_box_general1[find_span+14:find_li]
find_name = name_box_general1.find('_/name/')
find_li1 = name_box_general1[find_name:].find('">')
equipo = name_box_general1[find_name+11:]
buscar_mayor = equipo.find('">')
buscar_a = equipo.find('</a>')
equipo = equipo[buscar_mayor+2:buscar_a]
datos_lanzador["numero"]= numero
datos_lanzador["equipo"]= equipo
print datos_lanzador
name_boxaaa = soup.find('table', attrs={'class': 'tablehead mod-player-stats'})
table_headers = name_boxaaa.find_all('tr', attrs={'class': 'oddrow'})
results = []
for row1 in table_headers:
table_data = row1.find_all('td')
if u'Monthly Totals' in table_data[0]:
pass
elif table_data:
results.append([data.get_text() for data in table_data])
table_headers = name_boxaaa.find_all('tr', attrs={'class': 'evenrow'})
for row1 in table_headers:
table_data = row1.find_all('td')
if u'Monthly Totals' in table_data[0]:
pass
elif table_data:
results.append([data.get_text() for data in table_data])
for game in results:
dict_G_to_G = {}
day = str(game[0][3:]).strip()
month = game[0][0:3]
if month == 'Jan':
month= '01'
elif month == 'Feb':
month= '02'
elif month == 'Mar':
month= '03'
elif month == 'Apr':
month= '04'
elif month == 'May':
month= '05'
elif month == 'Jun':
month= '06'
elif month == 'Jul':
month= '07'
elif month == 'Aug':
month= '08'
elif month == 'Sep':
month= '09'
elif month == 'Oct':
month= '10'
elif month == 'Nov':
month= '11'
elif month == 'Dec':
month= '12'
if int(day)< 10:
day = "0"+str(day)
date_game = day +month+ "2017"
Birth_Date =datos_lanzador["Birth_Date"]
search_com = Birth_Date.find(",")
day = str(Birth_Date[search_com-3:search_com]).strip()
month = Birth_Date[0:3]
anio = Birth_Date[-2]
if month == 'Jan':
month= '01'
elif month == 'Feb':
month= '02'
elif month == 'Mar':
month= '03'
elif month == 'Apr':
month= '04'
elif month == 'May':
month= '05'
elif month == 'Jun':
month= '06'
elif month == 'Jul':
month= '07'
elif month == 'Aug':
month= '08'
elif month == 'Sep':
month= '09'
elif month == 'Oct':
month= '10'
elif month == 'Nov':
month= '11'
elif month == 'Dec':
month= '12'
if int(day)< 10:
day = "0"+str(day)
birthday = (day+month +Birth_Date[-5:]).strip()
print birthday
if "@" in game[1]:
localia = "Guest"
else:
localia = "Home"
versus_equip = game[1][-3:]
score = game[2][1:]
resulted = game[2][0]
IP = game[3]
H = game[4]
R = game[5]
ER = game[6]
HR = game[7]
BB = game[8]
SO = game[9]
GB = game[10]
FB = game[11]
Pit = game[12]
BF = game[13]
GSc = game[14]
formatter_string = "%d%m%Y"
print date_game
datetime_object = datetime.strptime(date_game, '%d%m%Y').date()
print datetime_object
datetime_birth = datetime.strptime(str(birthday) , formatter_string).date()
print datetime_birth
d0 = datetime_birth
d1 = datetime_object
delta = d1 - d0
dias_de_vida = delta.days
dias_de_vida_f = dias_de_vida
while dias_de_vida_f > 23:
dias_de_vida_f = dias_de_vida_f - 23
porcentaje_fisico = math.sin ( 2*math.pi*( dias_de_vida_f / 23.00))
porcentaje_fisico = 100*((porcentaje_fisico + 1)/2)
porc_fis = round(porcentaje_fisico, 2)
dias_de_vida_e = dias_de_vida
while dias_de_vida_e > 23:
dias_de_vida_e = dias_de_vida_e - 28
porcentaje_emocional = math.sin ( 2*math.pi*( dias_de_vida_e / 28.00))
porcentaje_emocional = 100*((porcentaje_emocional + 1)/2)
porc_emoc = round(porcentaje_emocional, 2)
dias_de_vida_i = dias_de_vida
while dias_de_vida_i > 23:
dias_de_vida_i = dias_de_vida_i - 33
porcentaje_intelectual = math.sin ( 2*math.pi*( dias_de_vida_i/ 33.00))
porcentaje_intelectual = 100*((porcentaje_intelectual + 1)/2)
porc_inte = round(porcentaje_intelectual, 2)
dict_G_to_G = {'date_game':datetime_object,
'localia':localia,
'versus_equip':versus_equip,
'score':score,
'resulted':resulted,
'IP':IP,
'H':H,
'R':R,
'ER':ER,
'HR':HR,
'BB':BB,
'SO':SO,
'GB':GB,
'FB':FB,
'Pit':Pit,
'BF':BF,
'GSc':GSc,
'porc_inte':porc_inte,
'porc_fis':porc_fis,
'porc_emoc':porc_emoc
}
list_game.append(dict_G_to_G)
datos_lanzador["list_game"]= list_game
print datos_lanzador
h = 0
table = PrettyTable(["Results", "P_Fis","P_Int" , "P_Emo","Runs","H","BB","Date", "Vs"])
for game in list_game:
table.add_row([game["resulted"],
game["porc_fis"],
game["porc_inte"],
game["porc_emoc"],
game["R"],
game["H"],
game["BB"],
game["date_game"],
game["versus_equip"]
])
h += 1
table= table.get_string(sortby=("Date"), reversesort=False)
print table