Behavioural Cloning — End to End Learning for Self-Driving Cars.

nachiket tanksale
The Startup
Published in
6 min readMay 21, 2019

--

Behavioural cloning is literally cloning the behaviour of the driver. The idea is to train Convolution Neural Network(CNN) to mimic the driver based on training data from driver’s driving. NVIDIA released a paper, where they trained CNN to map raw pixels from a single front-facing camera directly to steering commands. Surprisingly, the results were very powerful, as the car learned to drive in traffic on local roads with or without lane markings and on highways with minimum amount of training data. Here, we’ll use the simulator provided by udacity. The simulation car is equipped with 3 cameras in the front that record the video as well the steering angle corresponding to centre camera. We’ll train the same model as from the paper.

This exercise is done as part of “Udacity Self driving Car nanodegree”.

Collect The Data

The simulator has 2 lanes — the first lane is quite easy with smaller and few curves , while the other lane is difficult with many windy curves and steep hills.

We’ll use training data from both tracks

  • We’ll drive on both the lanes keeping car at the centre of the lane. We’ll drive for 2 laps each.
  • We’ll drive 1 lap each on both lanes where we try to drift to sides and try to steer to the centre of the lane. This will give us training data that can teach the model corrections.
(i) Left (ii) Centre (iii) Right camera image

The captured data contains path to left image, centre image and right image, steering angle, throttle, break and speed values.

We are only interested in path to left image, centre image and right image, and steering angle.

Note:- We’ll use all the left, centre and right images. we’ll adjust the steering angle for left_image by adding some adjustment . Similarly, we’ll adjust steering angle for right_image by subtracting adjustment. In a way , we just triple the training data.

Data Imbalance

Histogram of steering angles

Above histogram shows the training data imbalance. The data for left turns is more than the one for right turns. We’ll compensate this by randomly flipping the training images and adjust steering angle to -steering_angle.

Also the most of the steering angles are concentrated around 0–0.25 and we don’t have much data for larger steering angles. We’ll compensate this by randomly shift the image horizontally and vertically by some pixels and adjusting the steering angles accordingly.

Data Augmentation

We’ll use following augmentations:

  • Randomly flip some of the images and adjust the steering angle to -steering_angle
  • Randomly shift the images horizontally and vertically by some pixels and adjust the steering angle by using small adjustment factor.
  • There are shadows of trees, poles etc on the road. So, we’ll add some shadows to training images.
  • We’ll randomly adjust the brightness of the images.

These are standard OpenCV adjustments and code can be found in GitHub repository.

After applying the augmentations , below are the outputs for some of the training images.

Pre-processing

The paper expects the input size of the image to 66*200*3, while the images from the training are of size 160*320*3. Also , paper expects to convert input images to YUV colour space from RGB.

Also there is no need of mountains present in image or car bonnet present at the bottom of the image to be used for training.

So we’ll crop the upper 40 pixel rows and bottom 20 pixel rows from input images. Also as part of pre-processing, we’ll resize cropped image to 66*200*3 size and convert it to YUV colour space.

Model

Here’s the PilotNet model described in the paper:

The model has following layers:

  • Normalisation layer (hard-coded)- divide by 127.5 and subtract 1.
  • 3 convolution layers with 24, 36, 48 filters, 5*5 kernel and stride of 2.
  • 2 convolution layers with 64 filters , 3 *3 kernel and stride 1.
  • A flatten layer
  • 3 fully connected layers with output sizes of 100, 50, 10
  • and final output layer which outputs the steering angle.

We’ll use Mean Squared Error(MSE) as the loss and Adam optimizer with starting learning rate of 1e-4 for training.

I also used EarlyStopping callback on validation loss with patience of 5 epochs. I tried to train it for 40 epochs, but it stopped at 36.

Ta=raining vs Validation loss

I also tried modified model where

  • I used ‘elu’ activation instead of ‘relu’
  • Used dropout after Flatten layer for regularisation
  • Added non-linearity by ‘relu’ after dense layer.

I could train model for 60 epochs.

Below are the results:

Prediction

Here are the videos of the prediction using trained model.

Track 1
Track 2

Salient Features

Next paper “Explaining How a Deep Neural Network Trained with End-to-End Learning Steers a Car” concentrates on method to get salient features that are important to neural network and that influence the steering angle prediction and finds out that the salient features are usually the lanes, cars , bushes etc.

Below is the process to get salient features:

  1. In each layer, activations of the feature maps are averaged.
  2. The top most averaged map is scaled up to the size of the map of the layer below. The upscaling is done using deconvolution. The kernel size and stride are chosen same as in the convolutional layer used to generate the map.
  3. The up-scaled averaged map from an upper level is then multiplied with the averaged map from the layer below.
  4. Repeat steps 2 and 3 till until the input is reached.
  5. The last mask which is of the size of the input image is normalized to the range from 0.0 to 1.0.

Now this is visualisation map and shows which regions of the input image contribute most to the output of the network.

Visualisation method that identifies the salient objects

After applying above method here are the results for salient features

Here salient feature is mostly the prominent lane marking

Conclusion

PilotNet is quite powerful network that learns from driver to output correct steering angle. Examination of the salient objects shows that PilotNet learns features that “make sense” to a human, while ignoring structures in the camera images that are not relevant to driving. This capability is derived from data without the need of hand-crafted rules.

Code is available in GitHub repository here.

References:

  1. https://images.nvidia.com/content/tegra/automotive/images/2016/solutions/pdf/end-to-end-dl-using-px.pdf
  2. https://arxiv.org/pdf/1704.07911.pdf
  3. https://medium.com/@erikshestopal/udacity-behavioral-cloning-using-keras-ff55055a64c
  4. https://github.com/naokishibuya/car-behavioral-cloning

If you liked this article, please be sure to give me a clap and follow me to get updates on my future articles.

Also, feel free to connect me on LinkedIn or follow me on Twitter.

If you like my work do consider sponsoring me, it’ll help me put out more such work.

--

--