Interpreting Groundhog Day Data with Pentaho 9.0

Data Storage • Data Protection  |  February 12, 2020

To celebrate the official public release of Pentaho 9.0 and the proximate occurrence of Groundhog Day, we implemented an analysis of Groundhog Day data from 1898 – 2020.

For over 100 years, thousands of people have gathered annually at Gobbler’s Knob in Punxsutawney, Pennsylvania on February 2 to witness Punxsutawney Phil the groundhog make his spring forecast. Legend says that if Phil sees his shadow and returns to his den, the United States is in store for six more weeks of winter weather. But if Phil doesn’t see his shadow then the country can expect an earlier spring and overall warmer temperatures through February.

It is common knowledge that this practice is more superstition than science, with Phil’s predictions proving to be right approximately 40% of the time. In fact, a coin flip would be a better predictor of early spring weather than our friend Phil.

Regardless, we found the prediction data to be interesting. Dr. Nayak utilized the newly released Pentaho 9.0 editionto compile a dashboard examining aspects of the raw data in search of a meaningful pattern. Prediction counts revealed to us that Phil has a pretty serious case of heliophobia and will more often than not retreat to his den. A comparison of February and March average temperatures on both sides of Phil’s predictions revealed no significant patterns, and average temperature over the years proved to be fairly random as well. You can get a look inside the data visualizations below:


Dr. Pragyansmita Nayak