Monday, 15 July 2013

This is the end

I've not been able to make regular updates because of the huge amount of work I've had to take care in the last month.

Here are two videos of the final application working, hope you enjoy!

https://www.youtube.com/watch?v=SrX1TBjxNq0

https://www.youtube.com/watch?v=0FcO4e9MhKo

Regards!

Monday, 20 May 2013

Comparing detectors performance with different number of features


In my last entry I hypothesized that the under-performance of my first detector, when compared to the one documented in the original paper, could be explained by the difference in the number of features used. To verify this, I trained a classifier with only 10K features and run tests to compare both detectors. The results are illustrated in the next graph.

As illustrated by the graphic, the classifier trained with 10K features scores ~4 points bellow the one trained with 15K features at the reference value of 0.0001 FPPW. Having this results in mind, it is safe to assume that if the classifier was trained with 30K features, as it was in the original paper, the detector would most likely achieve results similar to the ones documented in the publication.

Regards.

Tuesday, 14 May 2013

First results

I've finally finished my first implementation of the pedestrian detection algorithm. Results for the INRIA dataset are shown in the following graph


At the reference value of 0.0001 false positives per window my detector correctly labels ~79.5% pedestrian windows. This is around 10 points bellow the original paper, and for this I offer two explanations.

Firstly and foremost, I've used a pool of 15000 features for learning and classification, when in the original paper 30000 were used.

Secondly, the original paper uses a optimized boosted cascade for decision-making. This type of classification not only speeds up the algorithm by several orders of magnitude, but also leads to a slightly better detection performance, since it is designed to reject most false positives on the first steps of the cascade.

Not being able to implement this type of classification myself due to the lack of time available, I resorted to extract less features to speed up the algorithm, thus sacrificing the results.

Given these explanations, I think the method is validated and the next step is to test it on our own self-obtained dataset. 

Best regards

Tuesday, 30 April 2013

Skeleton of my thesis

I've come across some road blocks around the way when it comes to training a classifier, so that part of the work is on standby right now.

Meanwhile I've been drawing the skeleton of my thesis, and this is my first draft.


1 – Introduction

Introducing the problem, motivations and objectives of the present work. 

1.1 – Problems

Problems normally associated with visual human detection.

1.2 – State of the Art

Brief explanation of some methods already developed for human detection, possibly referring to any real application that might already be implemented (not sure if any). Also a brief overview of the evolution of visual object detection algorithms in general.

1.3 – Solution

State my approach to solve the problem and why it was chosen, rather than any other.

2 – Experimental setup

Detailed explanation of the experimental platform implemented in ROS for the development of the present work. Also stating and explaining the main software tools used for elaborating the code (openCV). Possibly bring out that this application is to be implemented in the ATLAScar thus ilustrating the setup in run-time. This chapter will probably be divided in sub-topics.

3 – Integral Channel Features

A compact explanation of the algorithm.

3.1 – Channels

What is a channel of an image, which were computed and how

3.2 – Integral Images

What an integral image is, what they are for, how they are computed, why they are useful for this work.

3.3 – Features

What is a feature, how they are computed, how many and why. Ilustration of the random mechanism constructed for obtaining random parameters for feature harvesting.

3.4 – “The whole picture” (not sure of the name yet, but seems to me an important sub-topic)

An explanation of the architecture of the code, meaning, how the image is being treated, probably a fluxogram of some sort will come in handy.


4 – Machine Learning Method

Brief explanation of what a ML method is, why it is absolutely necessary for these detection problems.

4.1 – Adaboost

What is adaboost, why is it ideal for the present work

4.2 – Training a classifier

Explain all the steps necessary for successfully training a classifier.

5 – Experiments and Results

Explain how the results were acquired, and what makes this method a valid confirmation of the results.

5.1 – Results

Show results.

6 – Conclusions and Future Work

The title explains it self.

Saturday, 27 April 2013

Bootstrapping and evaluating performance

When I first looked at the results of my first classifier I wasn't to exited, since I was detecting not only pedestrians, but also trees, poles, traffic signs, etc. Bootstrapping is a critical operation for enhancing the performance of the detector, and it consists in running the algorithm on negative images, with the purpose of introducing to the classifier the false positive examples as negative ones.

After that I have to find a way for evaluation the performance of my algorithm. In detecting problems this is usually done with ROC curves, which plot the miss rate against the false positive rate.

The INRIA dataset has a training set and a testing set, each with positive-only images and negative-only images. So I am going to use the training set for training my classifier (bootstrapping and all), and then I will run the final classifier in the test set for different thresholds, getting results for miss rate, on the positive-only images, and false positive rate, on the negative-only images.

I'm hoping to be posting some results soon enough.

Regards!

Monday, 15 April 2013

Bounding Boxing

A positive classification of a Detection Window [DW] outputs the information about its coordinates, as well as at which scale that detection was made. Knowing that the same object will be detected several times at different scales, a careful treatment of the raw outputed information is mandatory.


This is a representation of the untreated data. By looking at it we immediately figure out that the same objects are being detected multiple times at different scales, so the first step was to transform all the rectangles to the original scale:

 
Final step is to group the rectangles in terms of distance between each other. This is a clustering problem, and is an advanced procedure. Fortunately openCV already has a function that does this for us, and using the standard parameters this is the result:



This image in particular was chosen for this entry because of the good results it provided. This is not the case for the whole dataset, since my algorithm is prone to identify bike wheels, trees, and other objects as pedestrians, so my works is far from finished.

Regards.

Wednesday, 10 April 2013

To pedestrian or not to pedestrian

The final step of the algorithm is the classification of Detection Windows [DW] for Pedestrian or Not Pedestrian.

In this entry I'll proceed to explain the steps I take for training and testing a classifier. Know that I'm not yet very well acquainted with all the concepts and different parameters on this matter.

In my problem, there is a DW to be classified, and thousands of features that describe it. In such problems Adaboost is normally a preferred approach, since it is a Machine Learning method that takes in a large number of weak classifiers (features) and creates a strong classifier. This is exactly the approach that ChnFtrs proposes, and fortunately openCV has an implementation of it.

Step 1: Introducing the data to the algorithm.

For this I need to prepare a .csv file with the category in one column, followed by the features. For example, if I was using 3 features my file could look like this:

N,1000,1020,900
P,2000,1200,300
P,3300,1235,1000
N,1432,1587,5587
...


The file  is being  generated by running the code on multiple images and writing the results to a file.

Step 2: Opening the file

Supposing that our file is called "train.csv", the code looks like this:

 CvMLData cvml;

  cvml.read_csv ("train.csv");

CvMLData stands for Computer Vision Machine Learning Data and is a class made just for handling machine learning problems with openCV.

Step 3: Set response index

This is to let openCV know in which column the response is

cvml.set_response_idx (0); //column 0

Step 4: Separating response from values

  const CvMat* Resp = cvml.get_responses();
  const CvMat* Values = cvml.get_values();
 
  Mat RespM(Resp, false); // change from old CvMat to the newer Mat class
  Mat ValM(Values,false);


Mat trainData = ValM.colRange(1, ValM.cols); //eliminating 1st column which
                                                                            // has the responses

Step 5: Training and saving a classifier

   CvBoost boost;
 
    boost.train(    trainData,                  //data
                          CV_ROW_SAMPLE,   //Samples in rows
                          RespM,                      //Responses
                          Mat(),
                          Mat(),
                          Mat(),
                          Mat(),
                         CvBoostParams(CvBoost::REAL,1000, 0, 1, false, 0),
                         false );
   
     boost.save ("./trained_boost.xml", "boost")

The CvBoost::train method as several parameters, some of which I'm not using yet. It is possible to select a subset of the train data to do the training, leaving the rest for testing, allowing for an immediate grasp on the performance of the classifier. It is also possible to have missing fields on the feature pool and let the algorithm fill them with approximated values.

As it is, I'm training a model of type REAL, with 1000 weak classifiers, using all the data on the file and leaving all other parameters default.

Step 6: Classifying new samples

On the main code I have to load the classifier and the run the CvBoost::predict method on samples with the same number of columns as the ones used to train it.

  CvBoost boost;
 
  boost.load("trained_boost.xml");

for(uint n=0;n<WindowFtrs.size();n++)
    {
     
      Test=Mat::zeros(1,NRFEATURE,CV_32FC1);
     
      for(uint i=0; i<NRFEATURE; i++)
      {
   
    Test.at<float>(0,i)=WindowFtrs[n][i];
   
      }
    float x = boost.predict(Test,Mat(),Range::all(),false,false);
    if (x==2) nPedcount++;
    if (x==1) Pedcount++;
   
    }
   cout<<"No Ped: "<<nPedcount<<" Ped: "<<Pedcount<<endl;

In this cycle, Test is filled with the feature values and then it is tested with the predict method.

x outputs the class (1  for Ped and 2 for nPed) and this way I was able to do some preliminary testing of my first classifier. On 1300 windows with pedestrians, the classifier failed 6 predictions, which is pretty good for a first try. I also tested the classifier for over 14 million DWs without pedestrians, leading to around 290000 false detections, which in terms of blunt accuracy means 98% correct predictions. However, I'll not discuss this results yet since this is hardly the best way to evaluate a classifier.

Regards!