Skip to content

chickenstats

Welcome to the technical documentation & reference materials for chickenstats, an open-source Python package for scraping & analyzing sports data.

Hero image of a scatter plot with drumsticks inter-mixed

With just a few lines of code:

  • Scrape & manipulate data from various NHL endpoints, leveraging chicken_nhl, which includes an open-source xG model for shot quality metrics
  • Augment play-by-play data & generate custom aggregations from raw csv files downloaded from Evolving-Hockey (subscription required) with evolving_hockey

Here you can find detailed guides & explanations for most features. The package is under active development - download the latest version (1.8.0) for the most up-to-date features & be sure to consult the correct documentation .

Installation

chickenstats requires Python 3.10 or greater & runs on the latest stable versions of Linux, macOS, & Windows operating systems.(1) You can install it through PyPi:

  1. Best practice is to develop in an isolated virtual environment (conda or otherwise), but who's a chicken to judge?
pip install chickenstats

⌨ Usage

chickenstats is structured as two underlying modules, each used with different data sources: chickenstats.chicken_nhl and chickenstats.evolving_hockey.(1)

  1. The package is under active development - features will be added or modified over time, but this structure will be consistent

chicken_nhl

chickenstats.chicken_nhl allows you to scrape play-by-play data and aggregate individual, line, and team statistics, with an open-source xG model included out-of-the-box.

After importing the module, scrape the schedule for game IDs, then play-by-play data for your team of choice:

from chickenstats.chicken_nhl import Season, Scraper

season = Season(2024)

schedule = season.schedule("NSH") # (1)!
game_ids = schedule.loc[schedule.game_state == "OFF"].game_id.tolist() # (2)!

scraper = Scraper(game_ids)

play_by_play = scraper.play_by_play # (3)!
  1. Replace Nashville with the three-letter code of the team of your choice. Leaving it blank will scrape everyone's schedule for that year
  2. Other game states include LIVE and FUT
  3. Scrapes one game every three seconds
Info

If you have already scraped or aggregated data, you'll notice slightly different behaviors than the simple guide below. chickenstats.chicken_nhl stores all data already scraped or aggregated, so it can be quickly provided when the relevant attribute is called (e.g., if you have already called Scraper.play_by_play and you have not added any new game IDs to the Scraper object, calling Scraper.play_by_play will return the dataframe, without having to re-scrape the data).

You can reset attributes with a matching prep_ method (e.g., Scraper.stats can be reset with Scraper.prep_stats()). See Design for more on this dynamic

You can then aggregate the play-by-play data for individual and on-ice statistics with one line of code:

stats = scraper.stats # (1)!
  1. This runs scraper.prep_stats() behind the scenes, if you have not already done so. By default aggregates to stats game level, but does not include teammates, opposition, or score state in the aggregation fields.

It's very easy to introduce additional detail to the aggregations, including for teammates on-ice:

scraper.prep_stats(teammates=True) # (1)!
stats = scraper.stats
  1. The Scraper object saves the prior aggregation to the scraper.stats attribute, so it needs to be reset. Then the attribute can be re-called

There is similar functionality for line and team stats:

scraper.prep_lines(position="f") # (1)!
forward_lines = scraper.lines

team_stats = scraper.team_stats # (2)!
  1. This step isn't strictly necessary for forwards - they're the default line aggregation. Provide "d" instead of "f" for defensive line stats
  2. Similar to scraper.stats, runs scraper.prep_team_stats() in the background

For additional information on usage and functionality, consult the relevant User guide

evolving_hockey

The chickenstats.evolving_hockey module manipulates raw csv files downloaded from Evolving-Hockey.(1) Using their original shifts & play-by-play data, users can add additional information & aggregate for individual & on-ice statistics, including high-danger shooting events, xG & adjusted xG, faceoffs, & changes.

  1. An Evolving-Hockey subscription is required to make full use of the chickenstats.evolving_hockey module. If you don't have a subscription, you can sign up for one here

First, prep a play-by-play dataframe using the raw play-by-play and shifts CSV files:

import pandas as pd
from chickenstats.evolving_hockey import prep_pbp, prep_stats, prep_lines

raw_shifts = pd.read_csv('./raw_shifts.csv') # (1)!
raw_pbp = pd.read_csv('./raw_pbp.csv') # (2)!

play_by_play = prep_pbp(raw_pbp, raw_shifts) # (3)!
  1. Download raw shifts data from here
  2. Download raw play-by-play data from here
  3. This returns a dataframe with a bunch more columns, essentially

You can use the play_by_play dataframe in various aggregations. This will return individual game statistics, including on-ice (e.g., GF, xGF) & usage (i.e., zone starts), accounting for teammates & opposition on-ice:

individual_game = prep_stats(play_by_play, level='game', teammates=True, opposition=True)

This will return game statistics for forward-line combinations, accounting for opponents on-ice:

forward_lines = prep_lines(play_by_play, level='game', position='f', opposition=True)

For additional information on usage and functionality, consult the relevant User guide

Help

If you need help with any aspect of chickenstats, from installation to usage, please don't hesitate to reach out. You can find me on :material-bluesky: Bluesky at @chickenandstats.com or email me at chicken@chickenandstats.com.

For more information on known issues or the longer-term development roadmap, see Contribute

Tip

Navigate the site using the header, side-bar, or search tool. Mobile users can tap (upper-left) to bring up the menu, then to see a linked table of contents for the current page, or to navigate the menu back towards the home page.

  • User guide & tutorials


    Learn more from module-specific user guides, as well as hands-on tutorials & examples.

    User Guide

  • Reference materials


    Consult the Reference section for in-depth explanations & debugging assistance.

    Reference

  • xG model


    Learn about the open-source expected goals (xG) model included with chickenstats.

    xG model

  • Blog


    Read the latest analyses leveraging the library, as well as about the newest features & releases.

    Blog

  • Design


    Read more about chickenstats module design and [un]expected behaviors.

    Design

  • Contribute


    Read about known issues, future development roadmap, and/or how to contribute.

    Contribute

Help

If you need help with any aspect of chickenstats, from installation to usage, please don't hesitate to reach out! You can find me on :material-bluesky: Bluesky at @chickenandstats.com or email me at chicken@chickenandstats.com.

Please report any bugs or issues via the chickenstats issues page, where you can also post feature requests. Before doing so, please check the roadmap, there might already be plans to include your request.

Acknowledgements

chickenstats wouldn't be possible without the support & efforts of countless others. I am obviously extremely grateful, even if there are too many of you to thank individually. However, this chicken will do his best.

First & foremost is my wife - the lovely Mrs. Chicken has been patient, understanding, & supportive throughout the countless hours of development, sometimes to her detriment.

Sincere apologies to the friends & family that have put up with me since my entry into Python, programming, & data analysis in January 2021. Thank you for being excited for me & with me throughout all of this, especially when you've had to fake it...

Thank you to the hockey analytics community on (the artist formerly known as) Twitter. You're producing & reacting to cutting-edge statistical analyses, while providing a supportive, welcoming environment for newcomers. Thank y'all for everything that you do. This is by no means exhaustive, but there are a few people worth calling out specifically:

I'm also grateful to the thriving community of Python educators & open-source contributors on Twitter. Thank y'all for your knowledge & practical advice. Matt Harrison (@mharrison) deserves a special mention for his books on Pandas and XGBoost, both of which are available at his online store. Again, not exhaustive, but others worth thanking individually:

Finally, this library depends on a host of other open-source packages. chickenstats is possible because of the efforts of thousands of individuals, represented below: