Intro to the Sakura Data Project

Sakura in Japan

Cherry blossom season is a special season in Japan - families and groups of friends often plan trips to visit prime locations (such as city parks) for a celebratory picnic when the blossoms come in full bloom.

Cherry blossoms, known as sakura in Japanese, are the first flowers to bloom in the spring. They are followed by others such as lilac, sunflowers, and lavender, which bloom later in the summer or perhaps even into the early fall, especially in Hokkaido, the northernmost island of Japan.

Living in Hokkaido, sakura season is especially welcome after our long, cold winters. Sweets shops and cafes advertise limited-edition, sakura-flavored sweets and drinks that are only available for a few spring-time weeks.

This cultural phenomenon is the inspiration for a Sakura data project. This analysis and visualization projects begins, as many data-related endeavors do, by data scraping.

Data Collection

I scraped historical sakura bloom data from a website maintained by the Japanese government.

Python scripts were process to import the raw data, which contains information on Sakura flowering and full bloom dates in different regions of Japan since 1953. The raw data files can be found by searching the main website page for さくら開花 and さくら満開.

In the github repository, one may reference the files flowering.csv and bloom.csv, which contain clean data with the following column specifications:

day: month * 100 + day, first flowering date or date of full bloom1
year: %Y, year of collection
l_code: categorical variable [401-945], point of collection
l_name: categorical variable, kanji for point of collection
rm: categorical variable [6,7,8], represents different data collection methods1,2
1NA values for day and rm are recorded using the - character.

2N.B.: this is my understanding after attempting to translate the dataset annotations provided in Japanese.

Future data collection possibilities

On the main page of the Japanese government website, there is similar data avaiable for many other flowers. The data scraping framework developed for the exploration of sakura data may be useful for obtaining flowering and blooming data on other flowers as well.

Next steps

Stay tuned for more visualization and analysis developing using the sakura datasets. Future related posts will also be categorized in the sakura folder on this site, for the reader’s convenience.