chickenstats
Welcome to the technical documentation & reference materials for chickenstats, an open-source Python package for scraping & analyzing sports data.
With just a few lines of code:
- Scrape & manipulate data from various NHL endpoints, leveraging chicken_nhl, which includes an open-source xG model for shot quality metrics
- Augment play-by-play data & generate custom aggregations from raw csv files downloaded from Evolving-Hockey (subscription required) with evolving_hockey
Here you can find detailed guides & explanations for most features. The package is under active development - download the latest version (1.8.0) for the most up-to-date features & be sure to consult the correct documentation .
Installation
chickenstats
requires Python 3.10 or greater & runs on the latest stable versions of Linux, macOS, & Windows
operating systems.(1) You can install it through PyPi:
- Best practice is to develop in an isolated virtual environment (conda or otherwise), but who's a chicken to judge?
Usage
chickenstats
is structured as two underlying modules, each used with different data sources: chickenstats.chicken_nhl
and chickenstats.evolving_hockey
.(1)
- The package is under active development - features will be added or modified over time, but this structure will be consistent
chicken_nhl
chickenstats.chicken_nhl
allows you to scrape play-by-play data and aggregate individual, line, and team statistics,
with an open-source xG model included out-of-the-box.
After importing the module, scrape the schedule for game IDs, then play-by-play data for your team of choice:
from chickenstats.chicken_nhl import Season, Scraper
season = Season(2024)
schedule = season.schedule("NSH") # (1)!
game_ids = schedule.loc[schedule.game_state == "OFF"].game_id.tolist() # (2)!
scraper = Scraper(game_ids)
play_by_play = scraper.play_by_play # (3)!
- Replace Nashville with the three-letter code of the team of your choice. Leaving it blank will scrape everyone's schedule for that year
- Other game states include LIVE and FUT
- Scrapes one game every three seconds
Info
If you have already scraped or aggregated data, you'll notice slightly different behaviors than the simple
guide below. chickenstats.chicken_nhl
stores all data already scraped or aggregated, so it can be quickly provided
when the relevant attribute is called (e.g., if you have already called Scraper.play_by_play
and you have
not added any new game IDs to the Scraper object,
calling Scraper.play_by_play
will return the dataframe, without having to re-scrape the data).
You can reset attributes with a matching prep_
method (e.g., Scraper.stats
can be reset
with Scraper.prep_stats()
). See Design
for more on this dynamic
You can then aggregate the play-by-play data for individual and on-ice statistics with one line of code:
- This runs
scraper.prep_stats()
behind the scenes, if you have not already done so. By default aggregates to stats game level, but does not include teammates, opposition, or score state in the aggregation fields.
It's very easy to introduce additional detail to the aggregations, including for teammates on-ice:
- The Scraper object saves the prior aggregation to the
scraper.stats
attribute, so it needs to be reset. Then the attribute can be re-called
There is similar functionality for line and team stats:
scraper.prep_lines(position="f") # (1)!
forward_lines = scraper.lines
team_stats = scraper.team_stats # (2)!
- This step isn't strictly necessary for forwards - they're the default line aggregation. Provide "d" instead of "f" for defensive line stats
- Similar to
scraper.stats
, runsscraper.prep_team_stats()
in the background
For additional information on usage and functionality, consult the relevant User guide
evolving_hockey
The chickenstats.evolving_hockey
module manipulates raw csv files downloaded from
Evolving-Hockey.(1) Using their original shifts & play-by-play
data, users can add additional
information & aggregate for individual & on-ice statistics,
including high-danger shooting events, xG & adjusted xG, faceoffs, & changes.
- An Evolving-Hockey subscription is required to make full use of the
chickenstats.evolving_hockey
module. If you don't have a subscription, you can sign up for one here
First, prep a play-by-play dataframe using the raw play-by-play and shifts CSV files:
import pandas as pd
from chickenstats.evolving_hockey import prep_pbp, prep_stats, prep_lines
raw_shifts = pd.read_csv('./raw_shifts.csv') # (1)!
raw_pbp = pd.read_csv('./raw_pbp.csv') # (2)!
play_by_play = prep_pbp(raw_pbp, raw_shifts) # (3)!
- Download raw shifts data from here
- Download raw play-by-play data from here
- This returns a dataframe with a bunch more columns, essentially
You can use the play_by_play dataframe in various aggregations. This will return individual game statistics, including on-ice (e.g., GF, xGF) & usage (i.e., zone starts), accounting for teammates & opposition on-ice:
This will return game statistics for forward-line combinations, accounting for opponents on-ice:
For additional information on usage and functionality, consult the relevant User guide
Help
If you need help with any aspect of chickenstats
, from installation to usage,
please don't hesitate to reach out.
You can find me on :material-bluesky: Bluesky at @chickenandstats.com
or email me at chicken@chickenandstats.com.
For more information on known issues or the longer-term development roadmap, see Contribute
Navigation
Tip
Navigate the site using the header, side-bar, or search tool. Mobile users can tap (upper-left) to bring up the menu, then to see a linked table of contents for the current page, or to navigate the menu back towards the home page.
-
User guide & tutorials
Learn more from module-specific user guides, as well as hands-on tutorials & examples.
-
Reference materials
Consult the Reference section for in-depth explanations & debugging assistance.
-
xG model
Learn about the open-source expected goals (xG) model included with
chickenstats
. -
Blog
Read the latest analyses leveraging the library, as well as about the newest features & releases.
-
Design
Read more about
chickenstats
module design and [un]expected behaviors. -
Contribute
Read about known issues, future development roadmap, and/or how to contribute.
Help
If you need help with any aspect of chickenstats
, from installation to usage, please don't hesitate to reach out!
You can find me on :material-bluesky: Bluesky at @chickenandstats.com or
email me at chicken@chickenandstats.com.
Please report any bugs or issues via the chickenstats
issues page, where you can also post feature requests.
Before doing so, please check the roadmap, there might already be plans to include your request.
Acknowledgements
chickenstats
wouldn't be possible without the support & efforts of countless others. I am obviously
extremely grateful, even if there are too many of you to thank individually. However, this chicken will do his best.
First & foremost is my wife - the lovely Mrs. Chicken has been patient, understanding, & supportive throughout the countless hours of development, sometimes to her detriment.
Sincere apologies to the friends & family that have put up with me since my entry into Python, programming, & data analysis in January 2021. Thank you for being excited for me & with me throughout all of this, especially when you've had to fake it...
Thank you to the hockey analytics community on (the artist formerly known as) Twitter. You're producing & reacting to cutting-edge statistical analyses, while providing a supportive, welcoming environment for newcomers. Thank y'all for everything that you do. This is by no means exhaustive, but there are a few people worth calling out specifically:
- Josh & Luke Younggren (@EvolvingWild)
- Bryan Bastin (@BryanBastin)
- Max Tixador (@woumaxx)
- Micah Blake McCurdy (@IneffectiveMath)
- Prashanth Iyer (@iyer_prashanth)
- The Bucketless (@the_bucketless)
- Shayna Goldman (@hayyyshayyy)
- Dom Luszczyszyn (@domluszczyszyn)
I'm also grateful to the thriving community of Python educators & open-source contributors on Twitter. Thank y'all for your knowledge & practical advice. Matt Harrison (@mharrison) deserves a special mention for his books on Pandas and XGBoost, both of which are available at his online store. Again, not exhaustive, but others worth thanking individually:
- Will McGugan (@willmcgugan)
- Rodrigo Girão Serrão (@mathsppblog)
- Mike Driscoll (@driscollis)
- Trey Hunner (@treyhunner)
- Pawel Jastrzebski (@pawjast)
Finally, this library depends on a host of other open-source packages. chickenstats
is possible because of the efforts
of thousands of individuals, represented below: