Google Professional Data Engineer – TensorFlow and Machine Learning Part 10

August 5, 2023

22. Lab: K-Nearest-Neighbors

How can a single array of size 784 represent a two dimensional image in the MNIST data set? If you remember the MNIST data set, each image is 28 pixels by 28 pixels. In this lecture, we’ll see an implementation of the Key Nearest Neighbors algorithm in TensorFlow. The MNIST data set has a huge number of images representing handwritten digits. These images are all normalized to be 28 by 28. Each of these images has an associated label indicating what digit that image represents. If you’re interested in machine learning and pattern recognition algorithms, the MNIST data set is a great place to start. All of the normalization and other grungy cleanup of images has already been done for you. You can focus on the pattern recognition portions.

The code for this demo was originally written by Immerk Tamian and is available on GitHub at this URL that you see on screen. It’s a great example of a simple machine learning technique, and Immerric deserves our thanks for making it available to all of us. The objective of this algorithm is to find what digit a particular image represents. When we use Knees Neighbors, we find the distance of that image whose digit we want to identify from all other images in our training data set. Find the image that is the closest to our best image. That image is our nearest neighbor. Find the label that is associated with this nearest neighbor, and this label is what applies to our test image as well.

You’ve got the digit that our test image represents. This algorithm will be implemented in three steps. In the very first step, we’ll get the MNIST libraries. Once we have them downloaded onto our local machine where we can access it, we will calculate the L one distance between the test image and the entire training data set. And the third step, after we’ve set these up as a computation graph, is to run the algorithm. Let’s get started. Accessing the MNIST libraries. At the very top, we import the libraries that we need to run this code. We need the NumPy library, a numerical computation package, and TensorFlow. If you want to view the computation graph on TensorBoard, import the ML library from Google Datalab as well.

Turns out that accessing the MNIST data set from within TensorFlow is very easy indeed. It’s a very common operation, and there are helper libraries to help you do exactly that. The package TensorFlow Examples tutorials MNIST contains input underscore data input data read. Data sets allow you to directly read the MNIST data, store a reference to this data set in a variable named MNIST, and store this mist data set in a directory named MNIST underscore Data. Hopefully you remember the structure of the handwritten digit images in the MNIST data set. Let’s say you want to represent the image four. Every image is standardized to be of size 28 by 28, covering a total of 784 pixels.

The handwritten digit images in the MNIST data set are single channel images in grayscale. A single number can be used to represent the pixel value for each image. Let’s hit Shift Enter on this code cell and it will go ahead and download the MNIST data. You can see where it’s being stored by the output on screen. There is one little bit that we haven’t explained yet though. When we downloaded the MNIST data set, we said One Hot is equal to two. What does that mean? This One Hot notation refers to how the labels that are associated with every MNIST handwritten image are represented. This digit represents four and the label that will be associated with it will indicate four.

This label can be represented in what is called a One Hot notation. Consider a ten digit vector of all zeros. The only position where it will have the number one will be the index of the digit that is associated with this image. If you want to represent the digit five, One Hot notation will have the number one at index five. The one index that identifies the value of this digit will be set to one. The remaining indices will be set to zero. This is one Hot notation. We have the MNIST data here in One Hot notation. We can move on to get the training and test batches. MNIST Train next batch retrieves a batch of MNIST training data. We have 5000 images in this batch and 5000 associated labels in the One Hot notation.

The return value is a tuple which is stored in training digits and training labels. Since we are just looking at this KNN algorithm as an example, we chose to use just 5000 images in our training data set. Typically, in a real world problem, you will use thousands, if not millions of images. Similarly, MNIST test next batch gets 200 images which form our test data set. These are stored in test digits and test labels respectively. In the next step, we set up a placeholder using PF placeholder. This placeholder holds all the images in our training data set. The Placeholder tensor holds data of type float and its shape is non comma 784. This is the shape that is used to represent the list of images in our training dataset. All 5000 of them in one batch.

It’s pretty clear here that the first dimension, which says none, represents the number of images that we read into this placeholder. But the second dimension is 784. Where does that come from? Similarly, the placeholder for the test digit, a single test digit is a vector of length 784. Again, where does that 784 come from and how can that be used to represent an image? Let’s see how these greyscale images can be represented by a single vector of length 784. The dimensions of every image in the MNIs data set is 28 pixels by 28 pixels. 28 multiplied by 28 gives us 784. That means there are 784 pixels whose value have to be represented within a handwritten digit. The pixels in any image can be thought of as a grid on top of this image.

Every box within this grid is one pixel in a grayscale image. Every pixel can be represented by exactly one value.A floating point number between zero and one that represents the intensity of that pixel. Zero is fully black. One is fully white. In this grid, the parts of the image that are whiter will have higher values of intensity. This two dimensional grid, which represents a handwritten digit, can be unraveled to form a single vector of 784 pixels. Every row in the matrix is appended to the end of the previous row. You’re placing the rows in this matrix from end to end. 28 multiplied by 28 gives you 784. The first row of this matrix is at the very beginning of the single dimensional array.

At the center comes all the remaining pixels, and the last row is at the very end of the vector. This is a single dimensional array representation of a 2D handwritten digit image. That should explain why the placeholders are set up in this way. All our computations will be on single dimensional arrays, which makes things easier to understand. The next step is to set up the computation of L one distance. The L one distance is a measure that we’ll use to find that image in the training data set that is the nearest neighbor to our test image. Let’s look at the computations of this L one distance using arrays in some detail.

The innermost operation that we perform is TF negative on our test digit visually. Let’s consider that we have these two arrays, one representing the training digit on the left and one representing the test digit on the right. We don’t perform any operation on the training digit, but we calculate the negative of the test digit. PF dot negative, when applied to an array, simply flips the sign of all elements in that array. The original elements representing each digit were pixel intensity values. These are now negated. They’ll be either zero or a negative value for our test digit. Once we have the negative, the next step is to call TF add on the training digit and the negative of the test digit. We have the array representing the training digit on the left.

We add it to the negative of the array representing the test digit on the right. This is the result that we get. Some of these numbers will be positive. The other elements in the resultant array will be zero or negative. Once we have the vector that is the result of the Ad operation. We apply TF EBS to find the absolute value of every element in the resultant vector. So here is the result that we have for Ad for different images in the training data set against our single test image. TF absolute will find the absolute values for each element in the vector. Absolute values basically mean that any negative number in the array will be converted to its positive equivalent. Positive numbers and zeros will remain as is.

If you look on screen, all numbers in orange have had their sign flipped because of the absolute. At this point, we have a vector representation of the L one distance between the test digit and each of the training digits. We can reduce this vector representation to a single digit by calling TF reduce sum reduce sum will calculate the sum of all digits in that vector to give you one value. For example, here we have the TF dot reduce underscore sum of one, one, five and 1. 2. Once we have the L one distance representations down to a single digit, we can call TF r min on the resulting list of single digit distances. This will find the index of the nearest neighbor to the test digit. The one whose L one distance from the test digit has been calculated to be the smallest.

In our example here, PF arg min will give index is equal to zero. The training digit at this index is the nearest neighbor to our test digit. Now that we have all the computation set up, we are ready to run our graph. The first step is to initialize all the variables that we use using PF global variables initializer use a with statement to initialize the session variable and call session run on in it set up a for loop that iterates over all the test digits. We want to find the nearest neighbor for each test digit in turn, and we do it within this for loop. Run our prediction computation for each test digit against the entire training data set. Each result will give you an NN index. The NN index is the nearest neighbor for that particular test digit. Use this NN index to index into the training labels.

To find what handwritten digit was the nearest neighbor of our test digit, run NP r max. To find the real digit, convert it from one hot notation to a real number. Using NP dot r max, we’ll see a visual representation of how NP dot r max works. Let’s say this is the one hot notation that represents the digit four. NP dot r max will find that index in this vector which holds the maximum value which in one hot notation is the index four. The training labels will give you what label we predicted for that particular test image, and the test labels will give you what the true value of that label was. You can compare them to see how accurate this algorithm is. If the digit identified by the KNN algorithm is equal to the test label for that digit, you increase the accuracy of your algorithm.

Just go ahead and run this computation graph. You’ll get a bunch of test digits, figure out what the predictions are, compare them with the true label and you’ll see that our accuracy is pretty good. 92%. An image of a handwritten digit in the MNIST data set is 28 pixels. By 28 pixels, the image is in grayscale, which means that every pixel within the image can be represented by a single floating point number between zero and one. This number represents the intensity of that pixel. If you unravel the 28 by 28 matrix row wise, such that every row is tacked on to the end of the previous row, you get a single vector that is 784 pixels long. Pixel values in this vector is our image representation. You.

23. Learning Algorithm

Is a question that I’d like us to keep in mind as we go through the contents of this video. When we use the terms machine learning, deep learning and representation learning, what exactly does learning need? Learning a function is the same as reverse engineering a function. So when a system or an algorithm learns a function, it knows how to reverse engineer or how to go from the input to the output without having access to the underlying code. True or false? We now have a decent sense of TensorFlow’s mechanics, of how programs are set up and implemented on TensorFlow. Let’s turn our attention back to the neural network and to its most basic building block, the neuron.

Let’s now understand the role that neurons play as learning units. The neural network based classifier that we are using, classifier corpus of images and to see whether they are images of fish or mammals is in a sense, learning a function. This is a function which tells us how those images link to their output label and that function is learned using these layers in the neural network. This is what the feature selection and classification algorithm is all about. Some layers in that neural network are learning what the pixels represent. Other layers are learning what other features of the image represent. Edges, contours, and so on. But the intent of all of these layers in the neural network is the same.

It is to learn or to reverse engineer the relationship between the output and the input. We shall see in just a little bit how this reverse engineering is carried out using the interconnections between individual neurons. Let’s now be clear in our head on what the learning process is all about. A machine learning algorithm is one that is able to learn from data. This is a pretty standard definition of machine learning. The question that now arises is what exactly is learning? How do we quantify this term? How do we l whether an algorithm is actually learning from data or not? This brings us to the definition of learning algorithms. A famous textbook has defined learning algorithms like this.

We will parse this definition in a moment. But the bottom line of this definition is that a learning algorithm learns or improves with experience. It reverse engineers the output and it improves the quality of that reverse engineering by taking advantage of the training data. So let’s parse this. The first bit in this definition is the term tasks. Now, there are a bunch of standard tasks in machine learning classification, regression, clustering and rule detection. We have already had some exposure to three of these classification and regression, as well as clustering via the Knees neighbors algorithm. Now, clearly, in order to tell whether an algorithm is getting better or not, we also need a performance measure.

And performance measures are specific to the individual tasks. So in classification, it might be the accuracy. In regression, it might be the residual variance. Or it might be a metric known as cross entropy if you’re using logistic rather than linear regression. All of these are performance measures which tell us how well our learning algorithm is improving. Much like human experts, a learning algorithm improves with experience. And that experience is in terms of the training process. It’s in terms of exposure to a number of labeled instances. That is a corpus. So, putting all of this together, a learning algorithm learns from experience, which means that it improves its performance at tasks as measured by some performance measure P.

This is the heart of a learning algorithm and this really has to do with tweaking the insides of a model during the training process. It’s relatively straightforward to understand how an algorithm like linear regression learns given a large number of data points. These constitute its experience. It will tweak the values of its constants. Those are the slope and the intersect of the regression line. And it will tweak these in ways which decrease the loss function I e. Increase the performance. This process of learning is easy to quantify and understand for simple algorithms, but it can be a little challenging to understand how deep learning algorithms actually learn.

In a nutshell, deep learning algorithms learn by tweaking the variables which are the weights of their neurons. We’ll have more to say on this using a couple of examples. The simplest possible example, which is regression using a single neuron and then a slightly more involved example involving XOR learning. The XOR function requires three neurons arranged in two layers. We shall see how such a neural network can learn the XOR function via a training process. Before we plunge into that, let’s once again reiterate the exact relationship between neurons and neural networks. Layers in a computation graph represent groups of neurons which perform similar functions.

Each layer is going to consist of neurons which are then interconnected with neurons in other layers in possibly very complex ways.And as we’ve discussed, the term deep learning arises because there are many layers of neurons arranged in depth. The directed computation graphs which we discussed in the context of ensorflow learn the relationships between the data. The more complex the graph, the more the relationships that it can learn. For instance, simple functions such as linear regression can be learned using a really simple graph. Just one layer that’s one neuron will suffice, as we shall see. To learn an even slightly more complex function such as the XOR function, we will need two layers and three neurons.

Clearly, we can learn more and more complex functions by adding more and more layers and more and more neurons connected in increasingly complex ways. The interconnections between these neurons and how they learn that complex function very quickly becomes opaque to us. Very quickly, neural networks become black boxes. But that doesn’t really matter. As long as they learn a complex function, that’s really all that we care about and again, the term deep learning refers to the fact that to learn a very complex function we are going to stack many layers one after the other. So for instance, to learn a linear relationship between a y and an x variable, one straight line is enough.

This is the whole point of linear regression. And to learn a linear regression all that we need is one single neuron. A single neuron is enough to reverse engineer a linear regression. And here is how this would play out. We would start, as usual, with a set of data points. These would then be fed into the simplest possible neural network which just consists of one neuron. That neuron would then undergo a training process in which it would tweak or optimize its own parameters based on that training data. And it would then output a regression line. And as we shall see, the reason that regression can be learned using a single neuron is that one neuron performs two operations.

And one of those operations is a lot like linear regression. It is an affine transformation of the input. But now let’s say that we wish to learn a nonlinear function like the XOR function. Here what we really need to do is to reverse engineer a function which can be represented in pseudo code. As above, this function has two inputs x one and x two. If the values of x one and x two are equal to each other, we shall return zero else return one. And as an aside here, let’s just assume that x one and x two are both binary variables. They are bits, so they can only be zero or one. This is a nonlinear function, as we shall see. And so a reverse engineering this requires a more complicated function consisting of three neurons arranged in two layers.

Training such a network is already much more complex than training a single neuron. And so this is a training process which we will not really understand. The training will be handled for us behind the scenes by a framework like TensorFlow. But all that we know is that at the end of this process we will have three neurons which accept a pair of inputs. Those neurons will have certain specific weights and biases more on these in a moment. And the net effect of all of this will be that our neurons will be successfully able to reverse engineer an x or function. This means that if we pass in different values of x one and x two, the outputs will exactly match that of the XOR proof table.

And we shall see how this training process plays out in more detail. Extending this analogy, it is possible for neural networks to learn arbitrarily complex functions. If we could reverse engineer a bit of pseudocode which calculates XOR. By adding enough layers to a neural network we can learn just about any piece of code. This characteristic makes neural networks extremely powerful and versatile. We may not understand what the resulting neural network exactly does or how it goes about reverse engineering the function, but we know that it works. This makes neural networks much more versatile than other methods. Consider, for example, a naive base classifier.

That is a very specific method which works brilliantly well for classifying data when we have probabilities available. But it’s not going to be generalizable to use cases like classifying images. One last observation about a neural network. We can see that there are effectively processing units. These are the neurons and data flows between those processing units. These are the data items or the tensors. This represents an exact correspondence and exact parallelism with the computation graph in TensorFlow. And this parallel explains why TensorFlow is such a natural fit for building neural networks. Let’s turn back to the question we posed at the start of this video, and this statement is actually true.

Learning a function basically involves reverse engineering it. When we as human beings learn a function, something of the form y is equal to f of x. We learn what exactly the transformations on the x variables are during the course of that function. When a machine learning algorithm learns a function, it’s doing exactly the same thing. It’s going to try and match the output of the function without knowing or without having access to the exact code of the logic that went into the function in the first place. That satisfies the definition of reverse engineering. And that’s why learning a function is indeed synonymous with reverse engineering a function.

Uncategorized

Related posts:

Leave a Reply Cancel reply