One project to learn time series forecasting

No more reading blog posts. It is time to forecast for real. 😎

Here is a project you can build.

“Let’s predict taxi demand in NYC”

Let’s create a predictive model to forecast the number of taxi rides that will happen in Manhattan (New York City)

  • per hour (e.g. tomorrow between 5 PM and 6 PM), and
  • per zone (e.g. Zone 113 “Lower Manhattan)

in the following 3 days.

Taxi Zones in Manhattan

This model can help the operations team of the NYC taxi Commission optimize the distribution of the taxi fleet, in real-time, and maximize revenue.

Here are the steps to build this project.

Step 1. Fetch historical data on taxi rides 🚕

You can get this data from the NYC Taxi & Limousine Commission website.

There you will find month-by-month raw data on historical taxi rides, in Parquet format → Link to the data

Step 2. Pre-process the data into a time series format 📈

Aggregate the number of rides based on the hour and location of the pickup.

The resulting dataset has 3 columns:

1 → Pick up timestamp, rounded to the closest hour 🕐

2 → Pick up location 📍

3 → Number of rides 🚕

Time series data for Zone = 4 in year = 2022

Step 3. Train a predictive model 🏋️

Prophet is an open-source library by Facebook for time-series prediction. And it works like a charm for time series with strong patterns, like taxi demand.

This tutorial will get you up and running real quick.

Step 4. Push the code to GitHub 👩‍💻👨🏾‍💻

Make your work public, to increase its visibility and help you land (an even better) job.

Don’t forget to add a beautiful README file to the repo, where you explain

→ WHAT the business problem is, and

→ HOW you solved it

Wanna build this system?

I am preparing a hands-on tutorial, including

  • videos 🎥
  • slides 👨‍🏫
  • source code 👨🏽‍💻

to show you, step-by-step, how to design, build and operationalize prediction systems, like this one.

Subscribe to my e-mail list to be notified as soon as the tutorial is out ↓