I've not been able to make regular updates because of the huge amount of work I've had to take care in the last month.
Here are two videos of the final application working, hope you enjoy!
https://www.youtube.com/watch?v=SrX1TBjxNq0
https://www.youtube.com/watch?v=0FcO4e9MhKo
Regards!
Pedestrian Detection - My master's thesis
Monday 15 July 2013
Monday 20 May 2013
Comparing detectors performance with different number of features
In
my last entry I hypothesized that the under-performance of my first
detector, when compared to the one documented in the original paper,
could be explained by the difference in the number of features used.
To verify this, I trained a classifier with only 10K features and run
tests to compare both detectors. The results are illustrated in the
next graph.
As illustrated by the graphic, the classifier trained with 10K features scores ~4 points bellow the one trained with 15K features at the reference value of 0.0001 FPPW. Having this results in mind, it is safe to assume that if the classifier was trained with 30K features, as it was in the original paper, the detector would most likely achieve results similar to the ones documented in the publication.
Regards.
Tuesday 14 May 2013
First results
I've finally finished my first implementation of the pedestrian detection algorithm. Results for the INRIA dataset are shown in the following graph
At the reference value of 0.0001 false positives per window my detector correctly labels ~79.5% pedestrian windows. This is around 10 points bellow the original paper, and for this I offer two explanations.
Firstly and foremost, I've used a pool of 15000 features for learning and classification, when in the original paper 30000 were used.
Secondly, the original paper uses a optimized boosted cascade for decision-making. This type of classification not only speeds up the algorithm by several orders of magnitude, but also leads to a slightly better detection performance, since it is designed to reject most false positives on the first steps of the cascade.
Not being able to implement this type of classification myself due to the lack of time available, I resorted to extract less features to speed up the algorithm, thus sacrificing the results.
Given these explanations, I think the method is validated and the next step is to test it on our own self-obtained dataset.
Best regards
Tuesday 30 April 2013
Skeleton of my thesis
I've come across some road blocks around the way when it comes to training a classifier, so that part of the work is on standby right now.
Meanwhile I've been drawing the skeleton of my thesis, and this is my first draft.
Problems
normally associated with visual human detection.
Meanwhile I've been drawing the skeleton of my thesis, and this is my first draft.
1 –
Introduction
Introducing
the problem, motivations and objectives of the present work.
1.1
– Problems
1.2 – State of
the Art
Brief explanation of
some methods already developed for human detection, possibly
referring to any real application that might already be implemented
(not sure if any). Also a brief overview of the evolution of
visual object detection algorithms in general.
1.3 – Solution
State my
approach to solve the problem and why it was chosen, rather than any
other.
2 –
Experimental setup
Detailed
explanation of the experimental platform implemented in ROS for the
development of the present work. Also stating and explaining the main
software tools used for elaborating the code (openCV). Possibly bring
out that this application is to be implemented in the ATLAScar thus
ilustrating the setup in run-time. This chapter will probably be
divided in sub-topics.
3 –
Integral Channel Features
A compact explanation
of the algorithm.
3.1
– Channels
What is
a channel of an image, which were computed and how
3.2
– Integral Images
What an
integral image is, what they are for, how they are computed, why they
are useful for this work.
3.3
– Features
What is
a feature, how they are computed, how many and why. Ilustration of
the random mechanism constructed for obtaining random parameters for
feature harvesting.
3.4
– “The whole picture” (not sure of the name yet, but seems to
me an important sub-topic)
An
explanation of the architecture of the code, meaning, how the image
is being treated, probably a fluxogram of some sort will come in
handy.
4
– Machine Learning Method
Brief
explanation of what a ML method is, why it is absolutely necessary
for these detection problems.
4.1
– Adaboost
What is
adaboost, why is it ideal for the present work
4.2
– Training a classifier
Explain
all the steps necessary for successfully training a classifier.
5 –
Experiments and Results
Explain how the results were
acquired, and what makes this method a valid confirmation of the
results.
5.1
– Results
Show
results.
6 –
Conclusions and Future Work
The
title explains it self.
Saturday 27 April 2013
Bootstrapping and evaluating performance
When I first looked at the results of my first classifier I wasn't to exited, since I was detecting not only pedestrians, but also trees, poles, traffic signs, etc. Bootstrapping is a critical operation for enhancing the performance of the detector, and it consists in running the algorithm on negative images, with the purpose of introducing to the classifier the false positive examples as negative ones.
After that I have to find a way for evaluation the performance of my algorithm. In detecting problems this is usually done with ROC curves, which plot the miss rate against the false positive rate.
The INRIA dataset has a training set and a testing set, each with positive-only images and negative-only images. So I am going to use the training set for training my classifier (bootstrapping and all), and then I will run the final classifier in the test set for different thresholds, getting results for miss rate, on the positive-only images, and false positive rate, on the negative-only images.
I'm hoping to be posting some results soon enough.
Regards!
Monday 15 April 2013
Bounding Boxing
A positive classification of a Detection Window [DW] outputs the information about its coordinates, as well as at which scale that detection was made. Knowing that the same object will be detected several times at different scales, a careful treatment of the raw outputed information is mandatory.
Final step is to group the rectangles in terms of distance between each other. This is a clustering problem, and is an advanced procedure. Fortunately openCV already has a function that does this for us, and using the standard parameters this is the result:
This is a representation of the untreated data. By looking at it we immediately figure out that the same objects are being detected multiple times at different scales, so the first step was to transform all the rectangles to the original scale:
This image in particular was chosen for this entry because of the good results it provided. This is not the case for the whole dataset, since my algorithm is prone to identify bike wheels, trees, and other objects as pedestrians, so my works is far from finished.
Regards.
Wednesday 10 April 2013
To pedestrian or not to pedestrian
The final step of the algorithm is the classification of Detection Windows [DW] for Pedestrian or Not Pedestrian.
In this entry I'll proceed to explain the steps I take for training and testing a classifier. Know that I'm not yet very well acquainted with all the concepts and different parameters on this matter.
In my problem, there is a DW to be classified, and thousands of features that describe it. In such problems Adaboost is normally a preferred approach, since it is a Machine Learning method that takes in a large number of weak classifiers (features) and creates a strong classifier. This is exactly the approach that ChnFtrs proposes, and fortunately openCV has an implementation of it.
Step 1: Introducing the data to the algorithm.
For this I need to prepare a .csv file with the category in one column, followed by the features. For example, if I was using 3 features my file could look like this:
N,1000,1020,900
P,2000,1200,300
P,3300,1235,1000
N,1432,1587,5587
...
The file is being generated by running the code on multiple images and writing the results to a file.
Step 2: Opening the file
Supposing that our file is called "train.csv", the code looks like this:
CvMLData cvml;
cvml.read_csv ("train.csv");
cvml.read_csv ("train.csv");
CvMLData stands for Computer Vision Machine Learning Data and is a class made just for handling machine learning problems with openCV.
Step 3: Set response index
This is to let openCV know in which column the response is
cvml.set_response_idx (0); //column 0
Step 4: Separating response from values
const CvMat* Resp = cvml.get_responses();
const CvMat* Values = cvml.get_values();
Mat RespM(Resp, false); // change from old CvMat to the newer Mat class
Mat ValM(Values,false);
const CvMat* Values = cvml.get_values();
Mat RespM(Resp, false); // change from old CvMat to the newer Mat class
Mat ValM(Values,false);
Mat trainData = ValM.colRange(1, ValM.cols); //eliminating 1st column which
// has the responses
// has the responses
Step 5: Training and saving a classifier
CvBoost boost;
boost.train( trainData, //data
CV_ROW_SAMPLE, //Samples in rows
RespM, //Responses
Mat(),
Mat(),
Mat(),
Mat(),
CvBoostParams(CvBoost::REAL,1000, 0, 1, false, 0),
false );
boost.save ("./trained_boost.xml", "boost")
boost.train( trainData, //data
CV_ROW_SAMPLE, //Samples in rows
RespM, //Responses
Mat(),
Mat(),
Mat(),
Mat(),
CvBoostParams(CvBoost::REAL,1000, 0, 1, false, 0),
false );
boost.save ("./trained_boost.xml", "boost")
The CvBoost::train method as several parameters, some of which I'm not using yet. It is possible to select a subset of the train data to do the training, leaving the rest for testing, allowing for an immediate grasp on the performance of the classifier. It is also possible to have missing fields on the feature pool and let the algorithm fill them with approximated values.
As it is, I'm training a model of type REAL, with 1000 weak classifiers, using all the data on the file and leaving all other parameters default.
Step 6: Classifying new samples
On the main code I have to load the classifier and the run the CvBoost::predict method on samples with the same number of columns as the ones used to train it.
CvBoost boost;
boost.load("trained_boost.xml");
for(uint n=0;n<WindowFtrs.size();n++)
{
Test=Mat::zeros(1,NRFEATURE,CV_32FC1);
for(uint i=0; i<NRFEATURE; i++)
{
Test.at<float>(0,i)=WindowFtrs[n][i];
}
float x = boost.predict(Test,Mat(),Range::all(),false,false);
if (x==2) nPedcount++;
if (x==1) Pedcount++;
}
cout<<"No Ped: "<<nPedcount<<" Ped: "<<Pedcount<<endl;
In this cycle, Test is filled with the feature values and then it is tested with the predict method.
x outputs the class (1 for Ped and 2 for nPed) and this way I was able to do some preliminary testing of my first classifier. On 1300 windows with pedestrians, the classifier failed 6 predictions, which is pretty good for a first try. I also tested the classifier for over 14 million DWs without pedestrians, leading to around 290000 false detections, which in terms of blunt accuracy means 98% correct predictions. However, I'll not discuss this results yet since this is hardly the best way to evaluate a classifier.
Regards!
Subscribe to:
Posts (Atom)