Bike Sharing, demand prediction

Summary

This project has been done in the context of Udacity’s Deep Learning Nano degree. It is my first Neural Network and for that the challenges were multiple. This data set consists on information about a business bike rental. I need to build a NN to predict daily bike rental ridership.

I have been given some of the code shown, but I did have to implement on my own most of the math programming that you will see: backpropagation, forward pass and activation function as an example.

The files that have provided are,

  • hour.csv : bike sharing counts aggregated on hourly basis. Records: 17379 hours
  • day.csv - bike sharing counts aggregated on daily basis. Records: 731 days

Challenges: ANN, backpropagation, tensors, setting hyperparameters

Data Set

The data comes from the UCI Machine Learning Database.

Bike sharing systems are new generation of traditional bike rentals where whole process from membership, rental and return back has become automatic. Through these systems, user is able to easily rent a bike from a particular position and return back at another position. Currently, there are about over 500 bike-sharing programs around the world which is composed of over 500 thousands bicycles. Today, there exists great interest in these systems due to their important role in traffic, environmental and health issues.

Apart from interesting real world applications of bike sharing systems, the characteristics of data being generated by these systems make them attractive for the research. Opposed to other transport services such as bus or subway, the duration of travel, departure and arrival position is explicitly recorded in these systems. This feature turns bike sharing system into a virtual sensor network that can be used for sensing mobility in the city. Hence, it is expected that most of important events in the city could be detected via monitoring these data.

Attribute Information

Both hour.csv and day.csv have the following fields, except hr which is not available in day.csv.

  • instant: record index
  • dteday : date
  • season : season (1:springer, 2:summer, 3:fall, 4:winter)
  • yr : year (0: 2011, 1:2012)
  • mnth : month ( 1 to 12)
  • hr : hour (0 to 23)
  • holiday : weather day is holiday or not (extracted from [Web Link])
  • weekday : day of the week
  • workingday : if day is neither weekend nor holiday is 1, otherwise is 0.
  • weathersit :
    • 1: Clear, Few clouds, Partly cloudy, Partly cloudy
    • 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
    • 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds
    • 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
  • temp : Normalized temperature in Celsius. The values are derived via (t-t_min)/(t_max-t_min), t_min=-8, t_max=+39 (only in hourly scale)
  • atemp: Normalized feeling temperature in Celsius. The values are derived via (t-t_min)/(t_max-t_min), t_min=-16, t_max=+50 (only in hourly scale)
  • hum: Normalized humidity. The values are divided to 100 (max)
  • windspeed: Normalized wind speed. The values are divided to 67 (max)
  • casual: count of casual users
  • registered: count of registered users
  • cnt: count of total rental bikes including both casual and registered

Source

Hadi Fanaee-T

Laboratory of Artificial Intelligence and Decision Support (LIAAD), University of Porto INESC Porto, Campus da FEUP Rua Dr. Roberto Frias, 378 4200 - 465 Porto, Portugal

Original Source: http://capitalbikeshare.com/system-data Weather Information: http://www.freemeteo.com Holiday Schedule: http://dchr.dc.gov/page/holiday-schedule

Requirements

The list of the required packages is detailed in this file.

%matplotlib inline
%config InlineBackend.figure_format='retina'   
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt

Load and prepare the data

A critical step in working with neural networks is preparing the data correctly. Variables on different scales make it difficult for the network to efficiently learn the correct weights.

data_path = 'Bike-Sharing-Dataset/hour.csv'
rides = pd.read_csv(data_path)
rides.head()
yr holiday temp hum windspeed casual registered cnt season_1 season_2 ... hr_21 hr_22 hr_23 weekday_0 weekday_1 weekday_2 weekday_3 weekday_4 weekday_5 weekday_6
0 0 0 0.24 0.81 0.0 3 13 16 1 0 ... 0 0 0 0 0 0 0 0 0 1
1 0 0 0.22 0.80 0.0 8 32 40 1 0 ... 0 0 0 0 0 0 0 0 0 1
2 0 0 0.22 0.80 0.0 5 27 32 1 0 ... 0 0 0 0 0 0 0 0 0 1
3 0 0 0.24 0.75 0.0 3 10 13 1 0 ... 0 0 0 0 0 0 0 0 0 1
4 0 0 0.24 0.75 0.0 0 1 1 1 0 ... 0 0 0 0 0 0 0 0 0 1

Checking out the data

This dataset has the number of riders for each hour of each day from January 1 2011 to December 31 2012. The number of riders is split between casual and registered, summed up in the cnt column.

Below is a plot showing the number of bike riders over the first 10 days or so in the data set. (Some days don’t have exactly 24 entries in the data set, so it’s not exactly 10 days.) The hourly rentals can also be seen here. This data is pretty complicated! The weekends have lower over all ridership and there are spikes when people are biking to and from work during the week. Looking at the data above, we also have information about temperature, humidity, and windspeed, all of these likely affecting the number of riders.

rides[:24*10].plot(x='dteday', y='cnt')

Dummy variables

Here we have some categorical variables like season, weather, month. To include these in our model, we’ll need to make binary dummy variables. This is simple to do with Pandas thanks to get_dummies().

dummy_fields = ['season', 'weathersit', 'mnth', 'hr', 'weekday']
for each in dummy_fields:
    dummies = pd.get_dummies(rides[each], prefix=each, drop_first=False)
    rides = pd.concat([rides, dummies], axis=1)

fields_to_drop = ['instant', 'dteday', 'season', 'weathersit',
                  'weekday', 'atemp', 'mnth', 'workingday', 'hr'] #remove original features
data = rides.drop(fields_to_drop, axis=1)
data.head()
yr holiday temp hum windspeed casual registered cnt season_1 season_2 ... hr_21 hr_22 hr_23 weekday_0 weekday_1 weekday_2 weekday_3 weekday_4 weekday_5 weekday_6
0 0 0 0.24 0.81 0.0 3 13 16 1 0 ... 0 0 0 0 0 0 0 0 0 1
1 0 0 0.22 0.80 0.0 8 32 40 1 0 ... 0 0 0 0 0 0 0 0 0 1
2 0 0 0.22 0.80 0.0 5 27 32 1 0 ... 0 0 0 0 0 0 0 0 0 1
3 0 0 0.24 0.75 0.0 3 10 13 1 0 ... 0 0 0 0 0 0 0 0 0 1
4 0 0 0.24 0.75 0.0 0 1 1 1 0 ... 0 0 0 0 0 0 0 0 0 1

Scaling target variables

To make training the network easier, we’ll standardize each of the continuous variables. That is, we’ll shift and scale the variables such that they have zero mean and a standard deviation of 1. The scaling factors are saved so we can go backwards when we use the network for predictions.

quant_features = ['casual', 'registered', 'cnt', 'temp', 'hum', 'windspeed']
# Store scalings in a dictionary so we can convert back later
scaled_features = {}
for each in quant_features:
    mean, std = data[each].mean(), data[each].std()
    scaled_features[each] = [mean, std]
    data.loc[:, each] = (data[each] - mean)/std

Splitting the data into training, testing, and validation sets

We’ll save the data for the last approximately 21 days to use as a test set after we’ve trained the network. We’ll use this set to make predictions and compare them with the actual number of riders.

# Save data for approximately the last 21 days
test_data = data[-21*24:]

# Now remove the test data from the data set
data = data[:-21*24]

# Separate the data into features and targets
target_fields = ['cnt', 'casual', 'registered']
features, targets = data.drop(target_fields, axis=1), data[target_fields]
test_features, test_targets = test_data.drop(target_fields, axis=1), test_data[target_fields]

We’ll split the data into two sets, one for training and one for validating as the network is being trained. Since this is time series data, we’ll train on historical data, then try to predict on future data (the validation set).

# Hold out the last 60 days or so of the remaining data as a validation set
train_features, train_targets = features[:-60*24], targets[:-60*24]
val_features, val_targets = features[-60*24:], targets[-60*24:]

Time to build the network

Below you’ll build your network. We’ve built out the structure and the backwards pass. You’ll implement the forward pass through the network. You’ll also set the hyperparameters: the learning rate, the number of hidden units, and the number of training passes.

The network has two layers, a hidden layer and an output layer. The hidden layer will use the sigmoid function for activations. The output layer has only one node and is used for the regression, the output of the node is the same as the input of the node. That is, the activation function is f(x)=xf(x)=x

I will, -Implement the sigmoid function to use as the activation function. -Implement the forward pass in the train method. -Implement the backpropagation algorithm in the train method, including calculating the output error. -Implement the forward pass in the run method.

class NeuralNetwork(object):
    def __init__(self, input_nodes, hidden_nodes, output_nodes, learning_rate):
        # Set number of nodes in input, hidden and output layers.
        self.input_nodes = input_nodes
        self.hidden_nodes = hidden_nodes
        self.output_nodes = output_nodes

        # Initialize weights
        self.weights_input_to_hidden = np.random.normal(0.0, self.input_nodes**-0.5,
                                       (self.input_nodes, self.hidden_nodes))

        self.weights_hidden_to_output = np.random.normal(0.0, self.hidden_nodes**-0.5,
                                       (self.hidden_nodes, self.output_nodes))
        self.lr = learning_rate

        ####Set self.activation_function to your implemented sigmoid function ####
        #
        self.activation_function = lambda x : 1/(1 + np.exp(- x))

    def train(self, features, targets):
        ''' Train the network on batch of features and targets.

            Arguments
            ---------

            features: 2D array, each row is one data record, each column is a feature
            targets: 1D array of target values
        '''
        n_records = features.shape[0]
        delta_weights_i_h = np.zeros(self.weights_input_to_hidden.shape)
        delta_weights_h_o = np.zeros(self.weights_hidden_to_output.shape)
        for X, y in zip(features, targets):
            #### Implement the forward pass####
            ### Forward pass ###
            # Hidden layer
            hidden_inputs = np.dot(X, self.weights_input_to_hidden) # signals into hidden layer
            hidden_outputs = self.activation_function(hidden_inputs) # signals from hidden layer

            # Output layer
            final_inputs = np.dot(hidden_outputs, self.weights_hidden_to_output) # signals into final output layer
            final_outputs = final_inputs # signals from final output layer

            #### Implement the backward pass####
            ### Backward pass ###

            # Output error.
            error = y - final_outputs # Output layer error is the difference between desired target and actual output.

            # Calculate the hidden layer's contribution to the error
            hidden_error = np.dot(self.weights_hidden_to_output, error)
            #hidden_error = np.dot(error, self.weights_hidden_to_output.T)

            # Backpropagated error terms
            output_error_term = error
            hidden_error_term = hidden_error * hidden_outputs * (1 - hidden_outputs)

            # Weight step (input to hidden)
            delta_weights_i_h += hidden_error_term * X[:, None]
            # Weight step (hidden to output)
            delta_weights_h_o += output_error_term * hidden_outputs[:, None]

        # Update the weights
        self.weights_hidden_to_output += self.lr * delta_weights_h_o / n_records # update hidden-to-output weights with gradient descent step
        self.weights_input_to_hidden += self.lr * delta_weights_i_h / n_records # update input-to-hidden weights with gradient descent step

    def run(self, features):
        ''' Run a forward pass through the network with input features

            Arguments
            ---------
            features: 1D array of feature values
        '''

        #### Implement the forward pass####
        # Hidden layer
        hidden_inputs = np.dot(features, self.weights_input_to_hidden) # signals into hidden layer
        hidden_outputs = self.activation_function(hidden_inputs) # signals from hidden layer

        # Output layer
        final_inputs = np.dot(hidden_outputs, self.weights_hidden_to_output) # signals into final output layer
        final_outputs = final_inputs # signals from final output layer

        return final_outputs
        print (final_outputs)
def MSE(y, Y):
    return np.mean((y-Y)**2)

Training the network

The strategy here is to find hyperparameters such that the error on the training set is low, but I am not overfitting to the data. I’ll be using a method know as Stochastic Gradient Descent (SGD) to train the network. The idea is that for each training pass, you grab a random sample of the data instead of using the whole data set. I use many more training passes than with normal gradient descent, but each pass is much faster. This ends up training the network more efficiently.

Choosing the number of iterations

This is the number of batches of samples from the training data we’ll use to train the network. I want to find a number here where the network has a low training loss, and the validation loss is at a minimum.

Choosing the learning rate

This scales the size of weight updates. If this is too big, the weights tend to explode and the network fails to fit the data.

Choosing the number of hidden nodes

The more hidden nodes i have, the more accurate predictions the model will make.

import sys

### Set the hyperparameters here ###
iterations = 3000
learning_rate = 0.2
hidden_nodes = 3
output_nodes = 1

N_i = train_features.shape[1]
network = NeuralNetwork(N_i, hidden_nodes, output_nodes, learning_rate)

losses = {'train':[], 'validation':[]}
for ii in range(iterations):
    # Go through a random batch of 128 records from the training data set
    batch = np.random.choice(train_features.index, size=128)
    X, y = train_features.ix[batch].values, train_targets.ix[batch]['cnt']

    network.train(X, y)

    # Printing out the training progress
    train_loss = MSE(network.run(train_features).T, train_targets['cnt'].values)
    val_loss = MSE(network.run(val_features).T, val_targets['cnt'].values)
    sys.stdout.write("\rProgress: {:2.1f}".format(100 * ii/float(iterations)) \
                     + "% ... Training loss: " + str(train_loss)[:5] \
                     + " ... Validation loss: " + str(val_loss)[:5])
    sys.stdout.flush()

    losses['train'].append(train_loss)
    losses['validation'].append(val_loss)
plt.plot(losses['train'], label='Training loss')
plt.plot(losses['validation'], label='Validation loss')
plt.legend()
_ = plt.ylim()

Checking out my predictions

Let’s plot real values vs prediction

fig, ax = plt.subplots(figsize=(8,4))

mean, std = scaled_features['cnt']
predictions = network.run(test_features).T*std + mean
ax.plot(predictions[0], label='Prediction')
ax.plot((test_targets['cnt']*std + mean).values, label='Data')
ax.set_xlim(right=len(predictions))
ax.legend()

dates = pd.to_datetime(rides.ix[test_data.index]['dteday'])
dates = dates.apply(lambda d: d.strftime('%b %d'))
ax.set_xticks(np.arange(len(dates))[12::24])
_ = ax.set_xticklabels(dates[12::24], rotation=45)

Not too bad! the model has some unacuracy on predicting in Christmas time. That is probably due to the fact that we did not train the model with a representative sample of all the seasons, weather conditions, etc.