Pedestrian Detection - My master's thesis: April 2013

Tuesday, 30 April 2013

Skeleton of my thesis

I've come across some road blocks around the way when it comes to training a classifier, so that part of the work is on standby right now.

Meanwhile I've been drawing the skeleton of my thesis, and this is my first draft.

1 – Introduction

Introducing the problem, motivations and objectives of the present work.

1.1 – Problems

Problems normally associated with visual human detection.

1.2 – State of the Art

Brief explanation of some methods already developed for human detection, possibly referring to any real application that might already be implemented (not sure if any). Also a brief overview of the evolution of visual object detection algorithms in general.

1.3 – Solution

State my approach to solve the problem and why it was chosen, rather than any other.

2 – Experimental setup

Detailed explanation of the experimental platform implemented in ROS for the development of the present work. Also stating and explaining the main software tools used for elaborating the code (openCV). Possibly bring out that this application is to be implemented in the ATLAScar thus ilustrating the setup in run-time. This chapter will probably be divided in sub-topics.

3 – Integral Channel Features

A compact explanation of the algorithm.

3.1 – Channels

What is a channel of an image, which were computed and how

3.2 – Integral Images

What an integral image is, what they are for, how they are computed, why they are useful for this work.

3.3 – Features

What is a feature, how they are computed, how many and why. Ilustration of the random mechanism constructed for obtaining random parameters for feature harvesting.

3.4 – “The whole picture” (not sure of the name yet, but seems to me an important sub-topic)

An explanation of the architecture of the code, meaning, how the image is being treated, probably a fluxogram of some sort will come in handy.

4 – Machine Learning Method

Brief explanation of what a ML method is, why it is absolutely necessary for these detection problems.

4.1 – Adaboost

What is adaboost, why is it ideal for the present work

4.2 – Training a classifier

Explain all the steps necessary for successfully training a classifier.

5 – Experiments and Results

Explain how the results were acquired, and what makes this method a valid confirmation of the results.

5.1 – Results

Show results.

6 – Conclusions and Future Work

The title explains it self.

Saturday, 27 April 2013

Bootstrapping and evaluating performance

When I first looked at the results of my first classifier I wasn't to exited, since I was detecting not only pedestrians, but also trees, poles, traffic signs, etc. Bootstrapping is a critical operation for enhancing the performance of the detector, and it consists in running the algorithm on negative images, with the purpose of introducing to the classifier the false positive examples as negative ones.

After that I have to find a way for evaluation the performance of my algorithm. In detecting problems this is usually done with ROC curves, which plot the miss rate against the false positive rate.

The INRIA dataset has a training set and a testing set, each with positive-only images and negative-only images. So I am going to use the training set for training my classifier (bootstrapping and all), and then I will run the final classifier in the test set for different thresholds, getting results for miss rate, on the positive-only images, and false positive rate, on the negative-only images.

I'm hoping to be posting some results soon enough.

Regards!

Monday, 15 April 2013

Bounding Boxing

A positive classification of a Detection Window [DW] outputs the information about its coordinates, as well as at which scale that detection was made. Knowing that the same object will be detected several times at different scales, a careful treatment of the raw outputed information is mandatory.

This is a representation of the untreated data. By looking at it we immediately figure out that the same objects are being detected multiple times at different scales, so the first step was to transform all the rectangles to the original scale:

Final step is to group the rectangles in terms of distance between each other. This is a clustering problem, and is an advanced procedure. Fortunately openCV already has a function that does this for us, and using the standard parameters this is the result:

This image in particular was chosen for this entry because of the good results it provided. This is not the case for the whole dataset, since my algorithm is prone to identify bike wheels, trees, and other objects as pedestrians, so my works is far from finished.

Regards.

Wednesday, 10 April 2013

To pedestrian or not to pedestrian

The final step of the algorithm is the classification of Detection Windows [DW] for Pedestrian or Not Pedestrian.

In this entry I'll proceed to explain the steps I take for training and testing a classifier. Know that I'm not yet very well acquainted with all the concepts and different parameters on this matter.

In my problem, there is a DW to be classified, and thousands of features that describe it. In such problems Adaboost is normally a preferred approach, since it is a Machine Learning method that takes in a large number of weak classifiers (features) and creates a strong classifier. This is exactly the approach that ChnFtrs proposes, and fortunately openCV has an implementation of it.

Step 1: Introducing the data to the algorithm.

For this I need to prepare a .csv file with the category in one column, followed by the features. For example, if I was using 3 features my file could look like this:

N,1000,1020,900

P,2000,1200,300

P,3300,1235,1000

N,1432,1587,5587

...

The file is being generated by running the code on multiple images and writing the results to a file.

Step 2: Opening the file

Supposing that our file is called "train.csv", the code looks like this:

CvMLData cvml;

cvml.read_csv ("train.csv");

CvMLData stands for Computer Vision Machine Learning Data and is a class made just for handling machine learning problems with openCV.

Step 3: Set response index

This is to let openCV know in which column the response is

cvml.set_response_idx (0); //column 0

Step 4: Separating response from values

const CvMat* Resp = cvml.get_responses();
const CvMat* Values = cvml.get_values();

Mat RespM(Resp, false); // change from old CvMat to the newer Mat class
Mat ValM(Values,false);

Mat trainData = ValM.colRange(1, ValM.cols); //eliminating 1st column which
// has the responses

Step 5: Training and saving a classifier

   CvBoost boost;

    boost.train(    trainData,                  //data
                          CV_ROW_SAMPLE,   //Samples in rows
                          RespM,                      //Responses
                          Mat(),
                          Mat(),
                          Mat(),
                  Mat(),
                       CvBoostParams(CvBoost::REAL,1000, 0, 1, false, 0),
                       false );

     boost.save ("./trained_boost.xml", "boost")

The CvBoost::train method as several parameters, some of which I'm not using yet. It is possible to select a subset of the train data to do the training, leaving the rest for testing, allowing for an immediate grasp on the performance of the classifier. It is also possible to have missing fields on the feature pool and let the algorithm fill them with approximated values.

As it is, I'm training a model of type REAL, with 1000 weak classifiers, using all the data on the file and leaving all other parameters default.

Step 6: Classifying new samples

On the main code I have to load the classifier and the run the CvBoost::predict method on samples with the same number of columns as the ones used to train it.

CvBoost boost;

boost.load("trained_boost.xml");

for(uint n=0;n<WindowFtrs.size();n++)
    {

      Test=Mat::zeros(1,NRFEATURE,CV_32FC1);

      for(uint i=0; i<NRFEATURE; i++)
      {

    Test.at<float>(0,i)=WindowFtrs[n][i];

      }
    float x = boost.predict(Test,Mat(),Range::all(),false,false);
    if (x==2) nPedcount++;
    if (x==1) Pedcount++;

    }
   cout<<"No Ped: "<<nPedcount<<" Ped: "<<Pedcount<<endl;

In this cycle, Test is filled with the feature values and then it is tested with the predict method.

x outputs the class (1 for Ped and 2 for nPed) and this way I was able to do some preliminary testing of my first classifier. On 1300 windows with pedestrians, the classifier failed 6 predictions, which is pretty good for a first try. I also tested the classifier for over 14 million DWs without pedestrians, leading to around 290000 false detections, which in terms of blunt accuracy means 98% correct predictions. However, I'll not discuss this results yet since this is hardly the best way to evaluate a classifier.

Regards!

Sunday, 7 April 2013

Randomness - Generating a large pool of features

(Note: In this entry, the term detection window [DW] refers to the image that we will classify as pedestrian or not pedestrian. It is a section of a larger one)

There are 2 distinct approaches for describing a DW featurewise. Some researchers opt to generate a fine tuned pool of features that are subject to tests until they achieve satisfactory results. This is the case of the HOG classifier, which features constitute of local sums calculated over a dense overlapping grid in the DW. Other approach is to generate random features, knowing that we are guaranteed to get a good characterization of our scene if we our feature pool is large enough. This is the case of the Integral Channel Features [ChnFtrs] algorithm.

In the ChnFtrs approach, a feature has 3 parameters that are generated randomly. Given that we compute different channels of our DW, a feature is calculated on a random channel, over a rectangle which dimensions and position is also random.

In terms of implementation this is how I've done:

Create a struct that characterizes one feature

typedef struct
{
int nFtrs;           //number of features to be calculated
int channelidx; // index of the channel where to calculate the feature
int width;
int height;
int x;                 //up-left x position corner of the random rectangle
int y;                 //up-left y position corner of the random rectangle
} FtrParams;

Create a type of vector for this structs to hold all the parameters.

typedef vector<FtrParams> FtrVecParams;

Create a function that initializes this vector with N parameters

For generating random values I use the RNG class of openCV. Using the same seed It is possible to replicate results in the future.

void GetRandParams(int seed, int NrFtrs, FtrVecParams & randparams, Rect region)
{
cv::RNG rng(seed);

FtrParams Params;
Params.nFtrs=NrFtrs;

for (int n=0; n<NrFtrs; n++)
{
Params.channelidx=rng.uniform(0,CHANNELNR);

//rng.uniform(x,y) returns a value between x and y

    Params.width=rng.uniform(5,region.width+1);
    Params.height=rng.uniform(5,region.height+1);

    Params.x=rng.uniform(0,region.width-Params.width+1);
    Params.y=rng.uniform(0,region.height-Params.height+1);

    randparams.push_back(Params); //fills randparams (wich is declared

//elsewhere).
}

}

The Rect Region in the input arguments has the information of our DW size.

Calculating features over a DW

For this we need to functions:

void GetChnFtrsOverWindow(Mat IntegralChannels , vector<float> & features, vector<DVector> & WindowFtrs ,FtrVecParams Params, CvRect region)
{
vector<float> aux;

for (int n=0; n< Params[0].nFtrs ; n++)
{

      GetIntegralSum(IntegralChannels,features ,Params[n],region);

}

//the code below organizes the data in a vector in which each index has the //features of one DW.

for(int n=0; n<NRFEATURE; n++)
{
    aux.push_back(features[features.size()-NRFEATURE+n]);
}

WindowFtrs.push_back(aux);

}

And

void GetIntegralSum(Mat IntegralChannels, vector<float> & features,FtrParams Params, CvRect region)
{

vec10d tl, tr, bl, br;
int sR=Params.y+region.y, sC=Params.x+region.x, eR=sR+Params.height, eC=sC+Params.width, sum;

tl= IntegralChannels.at<vec10d>(sR,sC);
tr= IntegralChannels.at<vec10d>(sR,eC);
bl= IntegralChannels.at<vec10d>(eR,sC);
br= IntegralChannels.at<vec10d>(eR,eC);

sum=br[Params.channelidx]-bl[Params.channelidx]-tr[Params.channelidx]+tl[Params.channelidx];

features.push_back(sum);

}

I've already started playing with the openCV CvBoost class, for classification, but I've yet to get any meaningful results out of it, so I'll leave it to the next entry

Regards!