Monday 30 September 2013

Activity 15 - Neural Networks

A neural network is typically a computational model which mimics the processing model of the neurons in the human brain. The 'neurons' on said network has an advantage over Linear Discriminant Analysis in that it does not require a given boundaries or rules in order to perform the classification, it 'learns' these rules for itself through a given set of weighed inputs.

Fig. 1. Artificial Neural Network [1]


Consider the figure above, let us consider that the input is a vector and each given input node represents one dimension of the vector. Each input is given a given weight, and when processed gives out a variable a. This variable act on the activation function g, which gives the output z. This model describes the basic mechanics of a feedforward neural network. For this activity the Artificial Neural Network toolbox for Scilab was downloaded and the feature data from Activity 14 was used. 

A feedforward neural network was created, 3 feature data was used as input, run through one hidden layer, and the resulting classification would be represented in the form of three binary digits.

0 1 0 - green rectangle
1 0 0 - red rectangle
1 1 0 - yellow rectangle
0 1 1 - green circle
1 0 1 - red circle

The same training population was used and same number of instances were used for classification. A learning rate of 2.5 with the network trained for 400 repetitions for the weight. Using this process a 100% rate of classification was achieved when performed for 50 repetitions with random training population sampling.

I would like to give myself a grade of 10 for this activity for understanding and implementing the needed procedure.

Reference:
[1] M. Soriano, 'Neural Networks', Applied Physics 186 Activity Manual 2013

Wednesday 11 September 2013

Activity 14 - Pattern Recognition

Human vision allows for color constancy which allows the perception of the same color of an object even if illuminated by different sources. Though color constancy alone is not responsible for object identification, humans also rely heavily on pattern recognition, which can be as simple as that most apples are red, coins are circular and has a metallic texture, or as complex as leaf patterns, and hand writing patterns. For this to be done digitally through code, feature extraction must be performed first and then categorization.

On deciding on what image features to extract from the object, it would be more efficient to only acknowledge distinguishing features among the object which one wishes to classify. Thus for this activity I will be classifying the objects shown in Figure 1. The images are composed of generally circular and rectangular shaped cut outs of varying colors dominant along the red and green hues.

Figure 1. Cut outs taken into consideration for classification

To do this I will be extracting three features, namely RGB values, preferably for the red and green channel, and area estimation of each sample. In order to do this, each sample from the images has to be isolated, thus a good contrast between the object and the background would be a good starting point to segment the cutouts.

Figure 2. Original data subject to be categorized, but background provided poor contrast for segmentation. (Left) Shotgun Shells. (Middle and Right) Pistol bullet cartridges

After they are segmented, one performs blob analysis, using SearchBlobs() function, so as to be able to call on the blobs one-by-one for feature extraction. Thus to do this, area calculation was performed the same as that performed in Activity 5, particularly pixel counting, while green and red channel value value extraction was performed as the mean value of the corresponding color channels of the detected blobs. With texture extraction complete, we can now do pattern recognition. First, a sample set is used to train the code while the remaining samples are used to test if the classification is a success. If not, then the set of features used needs to be changed or additional features are added to have a higher success rate. Note that one should refrain from using all data set for training to avoid 'memorization' of data rather than pattern recognition. For this activity I will be using 75% of the data set as training set while the remaining 25% as the test samples.

Figure 3. 3D scatter plot of the feature values

Shown in the Figure above is the 3D scatter plot of the three features chosen, as can be seen the groups are neatly separated. Now if we are to classify the remaining data set into their corresponding groups there are many methods that can be implemented, of which include the metric distance and linear discriminant analysis (LDA). Initially I will only be performing the metric distance method but if I would include the LDA latter on. For the metric distance method, one can perform either the K-nearest neighbor (k-nn) rule using euclidean distance or the Mahalanobis distance algorithm. For simplicity, I will be using the k-nn rule; in this method the euclidean distance was calculated between the texture values of the object to be classified, known as Instance, from all the training data set. The nearest k neighbors are taken as votes, the group with the most number of votes is where the instance will be classified into. k is preferably an odd numbered integer as to avoid a tie between vote. For this activity 16 samples represent each group, with 12 used as training data set, and 4 to be classified per group; k = 11.

Applying the method mentioned above, 18 out of the 20 data samples were correctly classified, giving an accuracy of 90%. But from where could have the error arisen from? Looking back into the features taken into account, it can be seen that the distance measurement is heavily dependable on the pixel area in terms of the magnitude of values. Thus to equally distribute the weights of each feature, these values has to be normalized. Doing this corrects the error observed earlier and a perfect classification is achieved.


Figure 4. Normalized feature values


For this activity I give myself a grade of 10 for completing the activity and performing the necessary applications.

Wednesday 4 September 2013

Activity 13 - Image compression

In this activity we were to use Principal Component Analysis (PCA) in image reconstruction. PCA is a mathematical procedure that utilizes the orthogonal transformation to transform a set of possibly correlated variables into a smaller set of linearly uncorrelated variables which are called principal components. The principal components are ordered such that the first PC accounts for the highest variability in the data and each succeeding component has a higher variability than the next. 

In Scilab 5.4.1, the PCA can be performed by calling the pca(x) function. The output of the function is separated into three matrixes depending upon the variable names given. Following the default given by the demo, we shall call them lambda, facpr, and comprinc.  The input x is an n x p matrix wherein there are n observations with p components. The outputs lambda is a p x 2 matrix where the first column contains the correlation, and the second column contains the percentage variability of the PC. The facpr contains the principal components, or the eigenvalues, while comprinc contains the projections of each observation towards the PC. In this activity, the components for reconstruction is encased in the facpr matrix.

In order to perform the image compression, a template has to be set such that the image can be reconstructed. Thus in this activity the image shown below was used. Both are already scaled images, the first a 500 x 375 image while the second is a 510 x 248 image. The image is then partitioned into individual 10 x 10 blocks to act as the individual observations. With the observations to be in vector form, the 10 x 10 blocks were concatenated into 1 x 100 matrices and all observations were contained in the r*c x 100 matrix x, where r and c are the dimensions of the image. The principal components were thus calculated by using the pca() function.



Using the cumsum() function upon the second column in lambda, I can tell the degree of reconstruction done by using the combined weighted eigenimages in facpr. The weight themselves are computed as the scalar dot multiplication of the concatenated 10 x 10 image block and the corresponding eigenimage. As can be seen, the method implied works best for a 2D image, as such this was tested on the grayscale image of the kite.


The eigenimages and their corresponding weights were calculated as discussed earlier. The images were reconstructed from least (72%) to greatest (100%) in iterations of 4%. The minimum reconstruction will inevitably depend the highest variability of the eigenimages. As can be seen with the images below.


For size comparison refer to the table shown below, the size of the image cropped to the same size as the reconstructed image is 92.1 kB.


As can be seen, image size decreases between 92% to 96% reconstruction, but my take on this is that at the 92% percent reconstruction, pixelation of the image decreased which could have decreased the data load in the matrix. With that an image reconstruction has been performed on a grayscale image. On a colored image though this can be done by performing the reconstruction per color channel and combining the results in order to form the RGB image.


Due to the compresion, the 96% reconstruction looks similar to the 100% reconstruction. Again as shown in the table below, the image size decreases correspondingly to the percentage of reconstruction. Size of the original image cropped to the same dimensions is 456 kB.



In this activity I give myself a grade of 11 for being able to complete the activity and for being able to to the method on a colored image.