Friday, September 26, 2008

Activity 20: Neural Networks

In this activity, we use neural networks as another method to classify our objects into groups depending on the values of the extracted features. Similar to activity 19, we use 2 classes: kwekkwek and pillows. However, only 2 features are used to differentiate the 2 groups for simplicity. These are the length to height ratio and the red component of color.

A neural network is a computational model of how neurons in the brain work. It is a preferred alternative to linear discriminant analysis in the pattern recognition because one does not need heuristics and recognition rules to perform classification. Instead neural networks “learn” the rules of a mapping by example. Although it may take a long time for a network to train, its recognition processing speed is faster once it has learned.

source: M. Soriano, A20 - Neural Networks.pdf
____________________________________________________________________

A code was already prepared by J. Tugaff and it only needs to be modified depending on the number and values of the input features. For our purposes, we have:

2 input features
4 training objects from each class (kwekkwek and pillows)
4 test objects from each class


We first train our neural network using the training set.

Length/Height, Red component --(kwek kwek; pillows)
0.97, 0.74
1.00, 0.77
0.96, 0.73
0.91, 0.74
0.96, 0.30
1.00, 0.30
1.02, 0.33
1.02, 0.33
____________________________________________________________________

//original code c/o Jeric Tugaff

//ensure the same starting point each time
rand('seed',0);

//network def.
//neurons per layer, including input
//2 neurons in the input layer, 2 in the hidden layer and 1 in the output layer
N = [2,2,1];

//inputs, 1st column = length/height, 2nd column = red component
//training set
x = [0.97, 0.74;
1.00, 0.77;
0.96, 0.73;
0.91, 0.74;
0.96, 0.30;
1.00, 0.30;
1.02, 0.33;
1.02, 0.33]';

//targets, 0 if kwekkwek and 1 if pillows
t = [0 0 0 0 1 1 1 1];
//learning rate is 0.1 and 0 is the threshold for the error tolerated by the network
lp = [0.1,0];

W = ann_FF_init(N);

//training cycles
T = 1000;
W = ann_FF_Std_online(x,t,N,W,lp,T);
//x is the training, t is the output, W is the initialized weights,
//N is the NN architecture, lp is the learning rate and T is the number of iterations

//full run
ann_FF_run(x,N,W)
//the network N was tested using x as the test set, and W as the weights of the connections
____________________________________________________________________

We expect that this code will output values close to (0, 0, 0, 0, 1, 1, 1, 1), meaning that the first 4 objects will be classified as members of the 'kwekkwek' class while the last 4 objects will be classified as members of the 'pillows' class. The actual results are (0.1448030, 0.1291128, 0.1506565, 0.1342528, 0.8876302, 0.8956291, 0.8738883, 0.8738883). We can now say that our neural network has "learned" the features of each class and has sorted the objects into the correct groups. We now use the test set to see whether the network can classify the other objects accurately as well.

Length/Height, Red component --(kwek kwek; pillows)
0.67, 0.77
0.81, 0.79
1.08, 0.78
1.00, 0.81
1.09, 0.35
1.12, 0.35
1.05, 0.35
0.94, 0.32

Using the training parameters, the resulting values are (0.0900170, 0.0954682, 0.1355222, 0.1071037, 0.8711712, 0.8782258, 0.8610966, 0.8642343). A 100% correct classification was achieved. Next, we try to tweak the parameters (learning rate and training cycles) to see whether the values will go closer to 0 and 1 or whether the clasification will still remain 100% correct.

learning rate = from 0.1 --> change to 1.0
training cycles = from 1000 --> change to 400
result: (0.0201587, 0.0220962, 0.0384002, 0.0265042, 0.9631881, 0.9668789, 0.9575001, 0.9582095)

The values are closer to 0 and 1 using the new training parameters and the classification is still 100% correct. Now, let us try to change the order of the test set to make sure that the 100% correct classification does not depend on the order.

Length/Height, Red component --(kwek kwek; pillows)
0.67, 0.77
1.05, 0.35
1.08, 0.78
1.09, 0.35
0.81, 0.79
1.12, 0.35
0.94, 0.32
1.00, 0.81

learning rate = 1.0
training cycles = 400
result: (0.0201587, 0.9575001, 0.0384002, 0.9631881, 0.0220962, 0.9668789, 0.9582095, 0.0265042)

The classification is still 100% correct and we have shown that the order of the objects in the input matrix is not important.
____________________________________________________________________

I give myself 10 points for this activity because of the 100% correct classification of the objects into their respective classes. Thanks to Jeric Tugaff for the ANN toolbox and the code and Mark Leo Bejemino for helping me to apply the code to my objects and extracted features.

Thursday, September 18, 2008

Activity 19: Probabilistic Classification

In this activity, we apply a different method to classify objects into groups. In the previous activity, mean distance classification was used. Here, we employ probabilistic classification via Linear Discriminant Analysis (LDA). A detailed discussion on the method and equations used can be viewed from Pattern_Recognition_2.pdf by Dr. S. Marcos. We use two classes of objects from the previous activity and see whether this method classifies them correctly or not.

Kwekkwek and Pillows











Also shown below are the values for length/height, R, G, B for each object.Again, we separate the 8 objects per class into a test set and a training set. The training set is given below. From here the method is implemented and the code and results are also shown below.//act 19 code
//this code just follows the procedure described in Pattern_Recognition_2.pdf
x = [0.97 0.74 0.37 0.03;
1.00 0.77 0.42 0.03;
0.96 0.73 0.38 0.03;
0.91 0.74 0.38 0.06;
0.96 0.30 0.16 0.07;
1.00 0.30 0.16 0.07;
1.02 0.33 0.18 0.08;
1.02 0.33 0.19 0.09];

y = [1 1 1 1 2 2 2 2]';
g = 2;
x1 = x(1:4,:);
x2 = x(5:8,:);

u1 = [mean(x1(:,1)) mean(x1(:,2)) mean(x1(:,3)) mean(x1(:,4))];
u2 = [mean(x2(:,1)) mean(x2(:,2)) mean(x2(:,3)) mean(x2(:,4))];
u = [mean(x(:,1)) mean(x(:,2)) mean(x(:,3)) mean(x(:,4))];

x1_o = [x1(1,:)-u; x1(2,:)-u; x1(3,:)-u; x1(4,:)-u];
x2_o = [x2(1,:)-u; x2(2,:)-u; x2(3,:)-u; x2(4,:)-u];

n1 = 4; n2 = n1;
c1 = (x1_o'*x1_o)/n1;
c2 = (x2_o'*x2_o)/n2;

for r = 1:4
for s = 1:4
C(r,s) = (4/8)*(c1(r,s)+c2(r,s));
end
end

Cinv = inv(C);
p = [4/8; 4/8];

for k = 1:8
f1(k) = u1*Cinv*x(k,:)' - 0.5*u1*Cinv*u1' + log(p(1));
f2(k) = u2*Cinv*x(k,:)' - 0.5*u2*Cinv*u2' + log(p(2));
end
____________________________________________________________________

RESULTS:
From the table above, we can see that this method resulted in a 100% correct classification for the training set. The higher value of f will determine which group the object is classified under. We implement this on the test set and see what happens.Again, a 100% correct classification is obtained. We have illustrated the success of this method in classifying objects into groups.
____________________________________________________________________

I give myself 10 points for this activity since the method was implemented correctly and there is 0% error.

Tuesday, September 16, 2008

Activity 18: Pattern Recognition

In this activity, we try to test if a random object can be classified using pattern recognition. First, we obtain 8-10 pieces each of similar objects like leaves, fishballs, potato chips, etc. One category of objects is called a class. We assemble 8-10 objects that can be classified into 2-5 classes. Half of this will be used as a test set and the other as training set. In this case, I have assembled 8 pieces each of:

1. pillows (a chocolate-coated snack)
2. piatos potato chips
3. squidballs
4. kwek kwek (orange, flour-coated quail egg)



















We extract certain features of these objects such as size, shape, and color and arrange these numbers into an organized set called a feature vector. For each individual object, the feature vector that I will use will have four dimensions namely:

1. ratio of length to height of object (x-axis/y-axis)
2. red component of color
3. green component of color
4. blue component of color

After finding these values for our 32 objects in total, we tabulate them in order to try to visualize clustering in 4-D feature space. We expect clustering to happen if objects are similar to each other. Thus, piatos chips should be clustered together with other piatos chips since they are similar in both size and color.

To calculate for the size, I employed the following algorithm. Assuming that the each object under one class is depicted in one image, we open it in Scilab, convert to grayscale, look at its histogram to find the proper threshold value in order to convert it to B&W. After that, we do morphological operations to clean the image (close holes and connect/remove stray pixels). Then we use the follow function to measure the contour of the object, get the maximum and minimum values in the x and y axes, and find their ratio. To calculate for the color, we need to first separate the region of interest from the background. Once that is finished, we get the mean for each color channel. These numbers constitute our feature vector for each object. There are 8 objects each for 4 classes used so there are 32 feature vectors.
____________________________________________________________________
Shown below are the values for the feature vectors. For each class, half will be used as a test set and the other half as a training set.____________________________________________________________________

If we have any other random object, we can say whether it belongs to the class of squidballs or pillows through a method called minimum distance classification.source: M. Soriano, A18- Pattern Recognition.pdf

We use the training feature vectors to find the class representatives mj. Then, we classify the test feature vectors using this method.
____________________________________________________________________
class representatives or mean feature vector, mj:
1. kwek kwek - [0.96, 0.75, 0.39, 0.04]
2. squidballs - [1.03, 0.55, 0.35, 0.09]
3. pillows - [1.00, 0.32, 0.17, 0.08]
4. piatos - [1.35, 0.59, 0.45, 0.16]From this table it can be seen that only 2 out of the 16 objects were classified incorrectly. Thus, the success rate of this method is 87.5%. Errors may have come from the calculation of the length to height ratio because the images might not have been thresholded and/or morphologically enhanced correctly.
____________________________________________________________________

I give myself 10 points for this activity since the method was implemented resulting in only a few errors. The success rate of this method in pattern recognition was found to be 87.5%, which is a fairly acceptable number. Thanks to JC Nadora, Benj Palmares, Raf Jaculbia, Mer Camba, Ed David, Billy Narag, and Jorge Presto for buying the objects/food tested (kwek kwek, squidballs, piatos, pillows).

Tuesday, September 9, 2008

Activity 17: Basic Video Processing

Video, whether analog or digital, is a string of still images shown in rapid succession such that an impression of movement is perceived by an observer. Therefore, all image processing techniques we've learned so far may be applied on image frames of a video. The advantage of video is that it allows us access to another physical dimension, time. Thus, the dynamics and kinematics of a system can be extracted from video.

In this activity, we aim to extract some kinematic constants and variables like gravitational constant, position, and velocity upon taking a video of an event like dropping a ball or watching a spring reach equilibrium when initially stretched. These extracted variables will be plotted against time and the kinematic constants will be calculated. The results will be compared against analytical values.

In order to do this with Scilab, we need to install a video processing toolbox. However, the SIVP toolbox cannot read more than two frames from an avi video so we have to use other software to achieve our goal. These are Windows Movie Maker - for capturing the video and saving it directly on the PC, Stoik Video Converter - for converting the .wmv file generated by Movie Maker into an .avi file, and Virtual Dub - to parse the .avi file into image sequences in either .jpg or .bmp formats. Once stored as individual images, the position, shape or area can be extracted using image processing techniques. We loop through the image sequence to get the time variation of the variables of interest.

source: M. Soriano, A17 - Basic Video Processing.pdf

The kinematic event that I used was using a steel ruler as a physical pendulum. From this, I can calculate for some quantities like the center of oscillation, its position after some time, the period of oscillation, etc.
____________________________________________________________________

Physical Pendulum
http://en.wikipedia.org/wiki/Pendulum#Physical_pendulum

The simple pendulum assumes that the rod is massless and that the bob is small, so has negligible angular momentum in itself. A physical pendulum has significant size and mass, and hence a significant moment of inertia.

A physical pendulum behaves like a simple pendulum but the expression for the period is modified.

T = 2 \pi \sqrt{\frac{I} {mgL}}

where:

I is the moment of inertia of the pendulum about the pivot point
L is the distance from the center of mass to the pivot point
m is the mass of the pendulum

For a physical pendulum, since this has the same form as the simple pendulum, it is possible to define the center of oscillation. This is the distance from the pivot point, at which, if all the mass of the pendulum was concentrated (and thus forming a simple pendulum), would give the pendulum the same period.

\ell=\frac{I}{mL}
The moment of inertia in this case is given by

I_e = \frac {m(h^2)}{3}+\frac {m(w^2)}{12}

with height h, width w, and mass m (Axis of rotation at the end of the plate)
http://en.wikipedia.org/wiki/List_of_moments_of_inertia
____________________________________________________________________

Before calculating the desired quantities, we first need to process our images. This activity utilizes most of the techniques we have learned so far. Step by step, these are:

1. Separate the video per frame using other software.
2. White balance the images.
3. Normalize the color.
4. Do color segmentation on the object (in our case, the ruler that acted as a physical pendulum).
5. Convert to a B&W image with the proper threshold value.
6. Do morphological operations to close holes and enhance the image.

After these, we can calculate for the centroid proceed in solving for the desired quantities numerically.
____________________________________________________________________

RESULTS
ORIGINAL, WHITE BALANCED, NORMALIZED, SEGMENTED, B&W, MORPHED IMAGES














measured quantities: m = 175 g, length = 0.63 m, width = 0.03 m

Analytic solutions:
Moment of inertia = 175g(0.63m*0.63m)/3 + 175g(0.03m*0.03m)/12 = 23.165625 g m^2
Center of oscillation = I/mL = 23.165625 m/(175*0.315) = 0.4202381 m
Period of oscillation = 2*pi*sqrt(23.165625/(175*9.8*0.315)) = 1.3011116 s

We now proceed in solving these quantities numerically from the data.

Length of ruler = 105 pixels --> 0.63 m
Width of ruler = 6 pixels
Moment of inertia = 175g(0.63m*0.63m)/3 + 175g((6*0.63m/105)*(6*0.63m/105))/12
= 23.1714 g m^2

%error = 0.02493

Center of oscillation = 23.1714 m/(175*(52.5*0.63/105)) = 0.4203429 m

%error = 0.02494

Period of oscillation:
Our video was captured with a fps of 15. The graph of the position vs time of the centroid of the ruler is shown below.
The wave is not a perfect sinusoid due to some air resistance present. The peaks and troughs are relatively equally spaced between each other so the period can be calculated. Peak to peak, the heights for one frame is given from the data:
94, 93, 90, 87, 83, 78, 73.5, 68.5, 65, 64.5, 64.5, 65, 68.5, 73, 77.5, 83.5, 87.5, 90, 92.5, 92.5
Thus, one period was captured in 20 frames. If 15 frames = 1 second, then using ratio and proportion, 20 frames = 1.3333333 s.

%error = 2.47647
____________________________________________________________________

chdir('act17images\');
img = 216; //total no. of frames in video

//open cropped image for color segmentation
im2 = imread('act17crop1.jpg');
wid = zeros(img,1); leng = zeros(1,1);

for i = 1:img
im = imread('images'+string(i)+'.jpg'); //open images
white = im(1:120,120:160,:);

//white balance
r = mean(white(:,:,1));
g = mean(white(:,:,2));
b = mean(white(:,:,3));
im(:,:,1) = im(:,:,1)/r;
im(:,:,2) = im(:,:,2)/g;
im(:,:,3) = im(:,:,3)/b;
im = im/max(im);
m = find(im>1);
im(m) = 1;

//color normalization
I = im(:,:,1)+im(:,:,2)+im(:,:,3);
im(:,:,1) = im(:,:,1)./I;
im(:,:,2) = im(:,:,2)./I;
im(:,:,3) = im(:,:,3)./I;

//histogram backprojection
R = linspace(0,1,32); G = R;
P = zeros(32,32);
[x2,y2] = size(im2);
for j = 1:x2
for k = 1:y2
xr = find(R <= im2(j,k,1));
xg = find(G <= im2(j,k,2));
P(xr(length(xr)),xg(length(xg))) = P(xr(length(xr)),xg(length(xg))) + 1;
end
end

P = P/sum(P);
[x1,y1] = size(im);
bpj = zeros(x1,y1);
for j = 1:x1
for k = 1:y1
xr = find(R <= im(j,k,1));
xg = find(G <= im(j,k,2));
bpj(j,k) = P(xr(length(xr)), xg(length(xg)));
end
end
bpj = bpj/max(bpj);

//convert to B&W

bpj = im2bw(bpj, 0.02);

//morphological operations
strel = ones(4,1);
bpj = erode(bpj,strel); //opening operation
bpj = dilate(bpj,strel);
bpj = dilate(bpj,strel); //closing operation
bpj = erode(bpj,strel);

//get centroid
[p,q] = find(bpj == 1);
wid(i) = (max(q)+min(q))/2;
leng(i) = (max(p)+min(p))/2;
end

//plot position vs time
plot([1:img]/15,wid); //consider fps of video
title('physical pendulum');
xlabel('time (seconds)');
ylabel('relative amplitude');
____________________________________________________________________

I give myself 10 points for this activity since the calculated quantities were close to their analytic values. My collaborators were Jeric Tugaff and Mark Leo Bejemino.

Tuesday, September 2, 2008

Activity 16: Color Image Segmentation

In image segmentation, a region of interest (ROI) is picked out from the rest of the image such that further processing can be done on it. Selection rules are based on features unique to the ROI. One such feature is color. Color has been used to segment images of skin regions in face and hand recognition, land cover in remote sensing, and cells in microscopy.

In this activity, we aim to do image segmentation using 2 different methods: parametric and non-parametric probability distribution estimation. First, we convert image RGB to normalized chromaticity coordinates (NCC) and then apply the 2 techniques.

We can transform the image RGB values to NCC by,
Since r + b + g = 1, then b = 1 - r - g and it is enough to represent chromaticity by just two coordinates, r and g. When segmenting 3D objects it is better to first transform RGB into rgI. The figure below shows the r-g color space.When r and g are both zero, b = 1. Therefore, the origin corresponds to blue while points corresponding to r=1 and g=1 are points of pure red and green, respectively. Any color therefore can be represented as a point in NCC space. Thus the pixel chromaticities of a red ball will only occupy a small blob in the red region regardless of its shading variation.

source: M.Soriano, A16 - Color Image Segmentation.pdf

The histogram of a green chair is shown below.
____________________________________________________________________

Parametric method:
Original & Probability image

Here, the original object is a green chair. We cropped a region of interest (center part of the backrest) as our color model and tried to see where it matched that of the original image. The pixel values of the resulting image is the probability that a pixel will have the same color as the model. This method segmented the image well and we can infer that the bright parts of the seat are of the same color as that of the backrest.
____________________________________________________________________

Histogram backprojection (non-parametric) method:
Original & Probability image

Histogram backprojection is one technique where based on the color histogram, a pixel location is given a value equal to its histogram value in chromaticity space. We implemented this and the resulting image was generated. Upon comparison with the result from the parametric method, there are more white pixels that correspond to the cropped color model. We can really see where the similar colors lie. The segmentation is better using the 2nd method. This also has the advantage of faster processing because no more computations are needed, just a look-up of histogram values. In the parametric method, the mean and standard deviation of r and g must be found in order to calculate a Gaussian distribution that tests the probability of pixel membership to the ROI.
____________________________________________________________________

stacksize(4e7);

im1 = imread('green chair.jpg');
im2 = imread('green chair crop.jpg');
//imshow(im1);

R1 = im1(:,:,1); G1 = im1(:,:,2); B1 = im1(:,:,3);
I1 = R1 + G1 + B1;
R2 = im2(:,:,1); G2 = im2(:,:,2); B2 = im2(:,:,3);
I2 = R2 + G2 + B2;

r1 = R1./I1; g1 = G1./I1; b1 = B1./I1;
r2 = R2./I2; g2 = G2./I2; b2 = B2./I2;

//parametric estimation
ur = mean(r2); sr = stdev(r2);
ug = mean(g2); sg = stdev(g2);

pr = 1.0*exp(-((r1-ur).^2)/(2*sr^2))/(sr*sqrt(2*%pi));
pg = 1.0*exp(-((g1-ug).^2)/(2*sg^2))/(sg*sqrt(2*%pi));

prob = pr.*pg;
prob = prob/max(prob);
//imshow(prob,[]);

//2d histogram
r = linspace(0,1,32); r = g;
P = zeros(32,32);
[x,y] = size(im2);
for i = 1:x
for j = 1:y
xr = find(r <= im2(i,j,1));
xg = find(g <= im2(i,j,2));
P(xr(length(xr)), xg(length(xg))) = P(xr(length(xr)), xg(length(xg)))+1;
end
end
P = P/sum(P);
//surf(P);


//backprojection
[x,y] = size(im1);
recons = zeros(x,y);
for i = 1:x
for j = 1:y
xr = find(r <= im1(i,j,1));
xg = find(g <= im1(i,j,2));
recons(i,j) = P(xr(length(xr)), xg(length(xg)));
end
end
imshow(recons, []);
____________________________________________________________________

I give myself 10 points for this activity since the colored image was properly segmented using 2 different methods. Histogram backprojection is more efficient since less calculations are needed, only a look-up table. It also produced a better probability image. Thanks to Jeric Tugaff for his help in the histogram backprojection code.

Thursday, August 28, 2008

Activity 15: Color Image Processing

A colored digital image is an array of pixels each having red, green and blue light overlaid in various proportions. Per pixel, the color captured by a digital color camera is an integral of the product of the spectral power distribution of the incident light source S(l), the surface reflectance r(l), and the spectral sensitivity of the camera h(l). That is, if the color camera has spectral sensitivity hR(l), hG(l) , hB(l) for the red, green, and blue channels respectively, then the color of the pixel is given by
is a balancing constant equal to the inverse of the camera output when shown a white object (r(l) = 1.0). Equations (1) to (4) explain why sometimes the colors captured by a digital camera appear unsatisfactory.

In this activity, we aim to observe the differences between images obtained using several white balancing settings in a camera and compare two white balancing algorithms: Reference White and Gray World. We capture an obviously wrongly white balanced image, apply the two algorithms, and comment on their white balancing quality.

In the Reference White Algorithm, you capture an image using an unbalanced camera and use the RGB values of a known white object as the divisor. In the Gray World Algorithm, it is assumed that the average color of the world is gray. Gray is part of the family of white, as is black. Therefore, if you know the RGB of a gray object, it is essentially the RGB of white up to a constant factor. Thus, to get the balancing constants, you take the average red, green and blue value of the captured image and utilize them as the balancing constants.

source: M. Soriano, A15 - Color Image Processing.pdf
____________________________________________________________________

The following images were taken using an Olympus Stylus 770SW digital camera with an exposure of -2 under a fluorescent light source. I utilized a Macbeth chart since the major hues were represented and white was also present.

(order of images from left to right, top to bottom: auto WB, daylight, cloudy, tungsten, fluorescent1, fluorescent2, fluorescent3)










































Upon comparison, we can observe that the colors are represented in high quality in image 1 and 5, which are the auto white balance and fluorescent1 settings, respectively. The sunlight and cloudy settings made the colors look dimmer, flourescent2 and 3 had a larger blue component, and the tungsten setting was just plain bad. What it did was since a tungsten light source has a low blue component upon looking at its spectra, the camera increased the blue to compensate. I will use this obviously wrongly white balanced image in the next part of the activity.
____________________________________________________________________

Results:
ORIGINAL, REFERENCE WHITE, GRAY WORLD









We can see from the images that the Reference White Algorithm produced a better result. The colors are more vibrant. However, some patches became white, like yellow and light orange. But in comparison with the Gray World algorithm, this is still acceptable. Majority of the colors are now green or white in the last image, showing poor quality. Next, we investigate the effects of these two methods for objects with similar hues.

Again, I use the tungsten white balance setting because it is really not suited for the illumination condition (fluorescent). We implement the two algorithms and the results are shown below.

Results:
ORIGINAL, REFERENCE WHITE, GRAY WORLD









Again, the Reference White Algorithm produced the better result. This is because the red component of white is really red and we use this as the divisor in equations 1-3. In comparison with the Gray World algorithm, the red component of the whole image is not necessarily also red and thus the error is produced. From both the Macbeth chart and the blue objects, we observe that the blue parts become green and those with some red component turns white.
____________________________________________________________________

I give myself 10 points for this activity since both algorithms were implemented and the results were very clear. Thanks to Marge Maallo for her help in this activity.

Tuesday, August 26, 2008

Activity 14: Stereometry

Multiple 2D views allow the computation of depth information. In this activity, we use two identical cameras to capture and reconstruct a 3D object. Images are essentially 3D objects projected onto a 2D frame. From object points at (x,y,z), an image is reduced to (x,y) with z projected as a function of x and y and camera object geometry. In applications such as microscopy or terrain imaging, it is sometimes more informative to preserve the depth information z. In this way, the 3D image may be inspected at different view angles. Stereo imaging is the technique we will use in this experiment and it is inspired by how our two eyes allow us to discriminate depth.

From similar triangles, the following equations are generated. If done on several points on the image, we can reconstruct the object's 3D shape using the z's calculated.

source: M. Soriano, A14 - Stereometry.pdf

Shown below are two images of a simple 3D object that I captured using an Olympus Stylus 770SW digital camera. The vertices of the Rubik's cube will serve as reference points. From figure 1, b was measured to be 30mm using a ruler.
____________________________________________________________________

Next, we calibrate our camera using the algorithm in activity 11 to find the focal length f and the distances from the camera center, x1 and x2. The matrix A was calculated to be,
A = [
-12.892329
8.19365
-1.2825517
121.45776
-5.9037883
-6.0874536
12.008296
80.296042
-0.0214988
-0.0197136
-0.0140380
1]

We now get A(1:3,1:3) and factorize using RQ factorization in order to convert the matrix into the upper diagonal matrix K. We also need to ensure that the factorized A(3,3) element is 1 by dividing the whole matrix by K(3,3).RQ factorization code by Michael Overton
(source: http://www.cs.nyu.edu/faculty/overton/software/uncontrol/rq.m)

function [R,Q]= rq(A)

m = size(A,1);
n = size(A,2);
if n < m
error('RQ requires m<=n')
end

p = mtlb_fliplr(mtlb_eye(m));
Atp = A’*P;
[Q2,R2] = qr(Atp);
bigperm = [P zeros(m,n-m; zeros(n-m,m) mtlb_eye(n-m)];
q = (Q2*bigperm)’;
r = (bigperm*R2*P)’;


K =
[-14.756525, 0.0554842, 121.46413;
0, 14.493566, 80.290264;
0, 0, 1]

We can see that the camera center is located at xo = 121.46413 and yo = 80.290264. The actual size of the image is 256x217. Thus, we can say that the camera center is relatively close to the image center (considering only x since we only need x1 and x2, i.e. 121.5*121.5 = 243, which is close to 256). From this value of xo, we can now solve for x1 and x2 for different points on the cube. Then the height z will follow from the formula. We plot our results in 3D. (see Scilab help on splin2d, example 2)
____________________________________________________________________
Reconstructed images of the Rubik's cube:
Here we have 3 views of the reconstructed image. Although the edges are curved, the shape of the cube is clearly observed. A better reconstruction may be possible by locating more points. For the resulting images, only the 7 visible corner points in the cube were used in finding the correspondence between the 1st and 2nd images.

____________________________________________________________________

I give myself 9 points for this activity since the reconstruction was properly done but the result was not very convincing. The result looked like a cube with smoothened corners. Thanks to JC Nadora, Julie Dado, and Jeric Tugaff for their help in the use of splin2d and interp2d.