import numpy as np
import matplotlib.pyplot as plt
import random
import pickle
import pandas as pd
import cv2
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
Classifying Traffic Signs with Machine Learning
In this post, I will show you how to preprocess images of traffic signs, reshape those images and use them with random forest algorithm for classification. Part of image pre-processing and visualization code comes from The Complete Self Driving Car Course github repo which has a MIT license 1.
Random Forest (RF) is generally believed to be a useful algorithm for regression and classification problems where training data is tabular. In this post though, I will be using RF to predict the class of traffic signs that are available as images. Therefore, I would first reshape the input images to tabular form as a large 2-dimensional array (matrix). In the later posts, I will use convolution filters for a neural network as well as RF again to make predictions.
Let’s begin by loading the required packages.
Load packages
Load data
I use the pickle
package to load the training and test data. These two sets come from the german traffic signs bitbucket repo. The repo also contains a validation dataset, but I won’t be using it as I’m not doing any hyperparameter tuning in this post.
with open('german-traffic-signs/train.p', 'rb') as f:
= pickle.load(f)
train_data
with open('german-traffic-signs/test.p', 'rb') as f:
= pickle.load(f) test_data
What does the data look like?
Training data is a dictionary that has the following components:
train_data.keys()
dict_keys(['coords', 'labels', 'features', 'sizes'])
I only need the features
(images) and their corresponding class labels
. There are a total of 43 traffic sign types in these data. In the following piece of code, I rename the features
and labels
as training and test set pairs:
# Split out features and labels
= train_data['features'], train_data['labels']
X_train, y_train = test_data['features'], test_data['labels']
X_test, y_test
# 4 dimensional
print("X_train shape:\n\n", X_train.shape, "\n")
print("X_test shape:\n\n", X_test.shape)
print("=========\n=========\n")
print("y_train shape:\n\n", y_train.shape, "\n")
print("y_test shape:\n\n", y_test.shape)
X_train shape:
(34799, 32, 32, 3)
X_test shape:
(12630, 32, 32, 3)
=========
=========
y_train shape:
(34799,)
y_test shape:
(12630,)
We can see that the training data and test data contain more than 34K and 12K images respectively. Moreover, these images are sized as 32 x 32
making a total of 32 x 32 = 1,024
pixels per image. The last number, 3
, in the shape
attribute indicates that these images are coloured with 3 channels of red, green and blue.
Visualizing data
Images are made of pixels that contain values from 0 (black) to 255 (white). The traffic sign images have 3 channels, therefore, each channel contains arrays of pixel intensity values. Following shows the first channel of a single image randomly picked from the training data:
42)
np.random.seed(= 5
num_images = np.random.choice(X_train.shape[0], num_images, replace=False)
random_indices = X_train[random_indices]
random_images = y_train[random_indices]
random_images_classes
= random_images[0]
first_image = random_images_classes[0]
first_image_class
=np.inf, linewidth=np.inf)
np.set_printoptions(thresholdprint(first_image[:,:,0])
[[ 89 86 86 84 84 83 74 67 75 81 75 90 87 70 75 86 103 88 81 77 69 71 64 59 65 64 60 50 47 61 74 63]
[ 86 88 83 80 82 83 81 69 79 73 70 88 83 73 75 87 103 83 73 73 65 72 73 66 71 74 67 56 47 47 59 60]
[101 104 87 83 83 79 78 71 79 73 81 88 83 74 75 87 101 80 73 71 65 63 66 68 71 75 70 64 58 56 59 61]
[101 98 87 90 85 76 68 72 79 81 96 104 94 77 86 119 122 88 78 70 64 63 62 64 65 64 65 68 71 71 62 62]
[ 90 89 96 96 84 75 68 76 90 92 93 97 93 82 126 168 164 110 84 75 62 61 57 57 59 54 57 69 76 75 65 59]
[ 82 82 96 96 82 77 77 86 104 95 77 70 72 94 170 170 195 148 94 75 63 54 52 58 56 47 59 83 91 82 67 55]
[ 85 90 103 97 81 77 78 82 93 88 82 70 86 148 174 167 187 188 108 88 71 51 49 57 56 53 67 78 88 78 72 58]
[ 77 86 97 86 79 80 79 83 86 84 82 79 111 193 180 173 172 185 140 93 78 53 48 59 65 57 63 63 64 68 77 63]
[ 70 64 75 66 69 79 84 88 94 97 88 99 155 188 170 175 166 181 187 99 76 58 54 62 69 59 54 60 58 60 66 66]
[ 73 57 57 64 63 70 79 84 101 102 83 119 182 172 172 226 194 173 191 136 83 63 61 63 70 69 53 71 69 62 62 76]
[ 75 57 59 70 69 79 85 78 85 85 81 151 196 178 211 251 228 171 174 187 128 78 66 65 70 76 52 54 65 71 67 69]
[ 70 55 58 68 85 98 91 85 85 77 115 183 186 187 239 255 247 196 173 190 162 92 68 58 57 69 58 54 64 71 68 66]
[ 74 53 56 67 87 99 92 91 85 78 148 193 182 212 253 255 255 233 182 172 195 127 67 55 51 52 56 60 68 72 67 63]
[ 76 56 60 73 74 81 85 88 76 98 178 187 186 237 253 238 247 252 218 173 183 166 87 65 64 60 65 70 75 77 75 62]
[ 70 55 70 88 75 77 96 93 95 164 193 175 204 251 229 144 191 248 244 194 178 196 140 75 67 70 67 76 77 72 79 73]
[ 76 67 77 86 76 81 95 89 116 191 190 185 232 241 140 65 94 189 249 223 181 194 180 94 61 71 68 76 80 72 78 70]
[ 81 86 87 87 83 84 91 102 175 206 184 208 250 214 77 38 37 121 242 248 197 170 189 157 87 79 77 77 80 78 80 71]
[ 76 87 91 90 97 97 92 128 210 200 191 236 255 206 70 40 38 109 237 253 231 177 179 190 130 88 83 81 80 78 84 76]
[ 64 78 81 85 108 111 97 153 206 189 212 245 231 176 66 41 40 81 162 189 242 204 170 192 183 102 88 85 82 79 87 77]
[ 65 69 88 88 98 98 106 196 194 182 232 210 135 111 58 42 39 67 123 161 240 238 183 182 205 135 94 77 75 80 88 81]
[ 65 70 91 91 93 103 143 199 185 202 250 235 201 170 66 42 41 94 210 235 249 254 214 171 182 184 121 73 77 90 92 93]
[ 71 80 88 89 93 130 192 196 188 229 255 254 253 224 77 49 48 95 233 255 255 255 240 187 172 196 148 84 81 94 98 105]
[ 84 96 95 85 97 167 211 186 206 249 255 255 255 226 90 117 111 103 232 255 255 255 253 218 173 181 187 110 73 87 90 97]
[ 84 99 101 93 121 189 195 192 235 255 255 255 255 233 159 216 217 176 238 255 255 255 255 241 194 184 190 163 82 83 83 84]
[ 81 97 104 102 171 233 209 215 248 255 255 255 255 251 242 253 255 247 252 255 255 255 255 247 208 196 166 191 135 95 86 81]
[ 88 97 103 123 199 238 241 245 254 250 244 240 236 235 232 227 224 221 221 220 215 215 225 228 203 194 160 196 183 109 88 79]
[ 83 96 109 159 195 210 244 247 229 216 208 206 201 201 199 194 189 187 187 187 180 181 196 205 211 196 172 172 189 120 79 71]
[ 68 93 115 170 184 192 230 249 237 214 202 201 199 199 194 190 186 186 188 191 191 203 213 213 216 207 197 192 174 102 75 79]
[ 76 101 114 154 196 205 212 231 219 197 185 183 189 209 182 164 161 168 173 162 145 147 147 143 147 148 142 136 112 77 75 84]
[101 123 142 165 193 179 160 165 138 105 87 84 98 139 97 79 89 107 124 95 65 64 70 75 82 88 82 74 66 66 76 85]
[114 143 183 194 178 144 144 159 131 83 61 63 89 125 86 73 88 109 142 110 73 68 72 77 79 82 76 74 73 76 84 91]
[104 127 175 187 155 142 169 183 149 85 54 56 107 174 105 75 89 106 124 98 82 73 64 57 64 74 69 73 75 82 96 89]]
Do you notice the triangle shape the numbers make? Let’s see the actual image:
0]);
plt.imshow(random_images['off'); plt.axis(
The german traffic signs repo also contains a table with the sign label and description. Following shows the first few rows:
= pd.read_csv('german-traffic-signs/signnames.csv')
data data.head()
ClassId | SignName | |
---|---|---|
0 | 0 | Speed limit (20km/h) |
1 | 1 | Speed limit (30km/h) |
2 | 2 | Speed limit (50km/h) |
3 | 3 | Speed limit (60km/h) |
4 | 4 | Speed limit (70km/h) |
Using the table above, I plot 5 images randomly picked from the training data and show each channel as follows:
= data.loc[data.ClassId.isin(random_images_classes), ['SignName']].copy()
df = df.reindex(random_images_classes)
df = np.array(df['SignName']) random_images_classes_namez
= plt.figure()
plot 10)
plot.set_figwidth(8)
plot.set_figheight(# plot each channel for the random images
for i in range(num_images):
for j in range(3):
3, i*3+j+1)
plt.subplot(num_images, if j == 0: # Red channel
='Reds')
plt.imshow(random_images[i][:,:,j], cmapelif j == 1: # Green channel
='Greens')
plt.imshow(random_images[i][:,:,j], cmapelif j == 2: # Blue channel
='Blues')
plt.imshow(random_images[i][:,:,j], cmap'off')
plt.axis('{} {}'.format(random_images_classes_namez[i], j+1))
plt.title(
plt.show()
Hopefully, for people new to image classification, it is clear what we are dealing with. Let’s also look at the top five traffic signs in the training data:
=[]
num_of_samples=[]
namez
= 5
cols = 43
num_classes
for i, row in data.iterrows():
= X_train[y_train == i]
x_selected len(x_selected))
num_of_samples.append('SignName'])
namez.append(data.loc[i,
# print(num_of_samples)
# print(namez)
# print(len(num_of_samples))
# print(len(namez))
= np.array(num_of_samples)
num_of_samples = np.array(namez)
namez
= np.argpartition(num_of_samples, -10)[-10:]
ind1 = ind1[np.argsort(num_of_samples[ind1])]
ind1 = num_of_samples[ind1]
num_of_samples_top5 = namez[ind1]
namez_top5
;
plt.barh(namez_top5, num_of_samples_top5)"Distribution of the training dataset");
plt.title("");
plt.ylabel("Number of images");
plt.xlabel(; plt.show()
Speed limit, no passing and yield signs are some of the most common signs in this training data. Having unequal number of images can affect the performance of a machine learning model as it is biased by the most frequent classes. But that’s a topic for another day.
Preprocessing data
Next, I preprocess the data as follows:
- Convert the original coloured image to a grayscale image. This would reduce the required computing resources as 3 channels are reduced to a single channel.
- Equalize the intensities of the image.
- Standardize the image by dividing each pixel intensity with 255.
def grayscale(img):
= cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img return img
def equalize(img):
= cv2.equalizeHist(img)
img return img
def preprocess(img):
= grayscale(img)
img = equalize(img)
img = img/255
img return img
= np.array(list(map(preprocess, X_train)))
X_train = np.array(list(map(preprocess, X_test))) X_test
Reshaping images
If I were to use a convolutional neural network, I would use the data as individual arrays of images, similar to how it is now. But RF needs the data to be in a tabular form, i.e., rows and columns for all data. Therefore, I reshape the data in a format where each row contains 32 x 32 = 1,024
columns of a single image:
= (X_train.shape[1] * X_train.shape[2])
num_pixels
= X_train.reshape(X_train.shape[0], num_pixels)
X_train = X_test.reshape(X_test.shape[0], num_pixels)
X_test
print(X_train.shape)
print(X_test.shape)
(34799, 1024)
(12630, 1024)
The numbers above show the rows and columns of training and test data.
Machine Learning
We are ready to fit a RF to the training data now. I use default values of all parameters but provide a random_state
to make sure the results can be replicated. I also use n_jobs=-1
to take advantage of all cores of my system.
= RandomForestClassifier(random_state=0, n_jobs=-1)
clf clf.fit(X_train, y_train)
RandomForestClassifier(n_jobs=-1, random_state=0)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RandomForestClassifier(n_jobs=-1, random_state=0)
The model is fit and is now ready to make predictions. I make predictions on both training and test sets.
= clf.predict(X_train)
y_pred_train = clf.predict(X_test) y_pred_test
And estimate the accuracy of predictions:
= accuracy_score(y_train, y_pred_train)
accuracy_train print("Train Accuracy:", np.round(accuracy_train))
= accuracy_score(y_test, y_pred_test)
accuracy_test print("Test Accuracy:", np.round(accuracy_test))
Train Accuracy: 1.0
Test Accuracy: 1.0
This test accuracy seems too good to be true. In practice, the model needs to be cross-validated and well-tuned.
Predicting a new image
Let’s see how our RF model does on an image downloaded from the internet. Following is the image:
import requests
from PIL import Image
= 'https://c8.alamy.com/comp/A0RX23/cars-and-automobiles-must-turn-left-ahead-sign-A0RX23.jpg'
url = requests.get(url, stream=True)
r = Image.open(r.raw)#Image.open('STOP_sign.jpg')
img =plt.get_cmap('gray')); plt.imshow(img, cmap
def preprocess2(img):
= np.asarray(img)
img = cv2.resize(img, (32, 32))
img = grayscale(img)
img = equalize(img)
img = img/255
img = img.reshape(1, 1024)
img return img
= preprocess2(img) img
print("Predicted sign: "+ str(clf.predict(img)))
Predicted sign: [12]
This prediction is incorrect as the corresponding sign for this prediction is priority road sign. The actual road sign is turn left ahead sign.
With some hyperparameter tuning, the model may show better results. However, keep in mind that this is expected as without learning anything about the specific properties of a traffic sign (that convolution layers can extract), RF is not able to learn much from the raw pixel intensities.