Google Professional Data Engineer – Regression in TensorFlow Part 7
March 29, 2023

15. Lab: Linear Regression using Estimators

When you use an Estimator to perform linear regression, how does the Estimator object know? What is the training data set? What are the while labels and other properties of the regression, such as the batch size or even the number of Eatbox? How does it get this information? In this lecture, we’ll see how we can implement linear regression in TensorFlow using something called Estimators. Estimators are a high level API which abstract away all the little details of regression which we implemented manually in our previous lectures. At this point, we are quite comfortable with all the steps involved in setting up a linear or logistic regression model in TensorFlow. We set up the computation graph. In both cases, it’s a neural network of one neuron.

Only the activation function differs. In the case of logistic regression, the activation function is soft max. In the case of linear regression, it’s the identity function. Based on the kind of regression that we were going to implement linear or logistic, we determined what the cost function would be. We set up an optimizer, made a choice as to whether it’s gradient, descent, ftrl, et cetera, and then trained our model. While training our model, we made decisions as to how many epochs we wanted to run, the batch size for each epoch, and also the optimizer we want to be used at the end. We got a converged model. Now, given that linear and logistic regression are very standard, basic examples of machine learning, they’re just one neuron after all.

What we just did in the last few lectures seems like a lot of work. That’s because we were manually or by hand setting up all the steps of linear and logistic regression. If you were actually implementing this in the real world, you’d use the higher level APIs that TensorFlow provided you would use Estimators. The code for implementing linear regression with Estimators is present in the Python notebook called linear regression with estimators. The first part of the file has all the helper methods that you’ll need to use in order to read in your stock data. Then comes the baseline implementation of our model. This is linear regression. So our baseline implementation uses the ScikitLearn library, and then after that comes our implementation using Estimators.

As we’ve discussed, Estimators are a high level API that TensorFlow offers for standard cookie cutter math models. Estimators abstract away all the details that we worked with while performing linear regression manually. All of linear regression using Estimators can fit in four or five lines of code. The very first step is to set up what features our linear regression will actually look at. In our case, we only look at the SNP 500 returns, so our feature vector will have just one column. The features in any regression model is specified as a list. As you can see by the square brackets here, our list contains just one feature. What you see within the square brackets is metadata. About that feature TF contrib layers realvalued column indicates that it is a column with real values.

X is the name of the feature and the feature has one dimension. The fact that the real valued column is in the TF contrib layers namespace indicates to us that this can be used even within the intermediate or hidden layers of a neural network. The next step is to instantiate our estimator object. Because we are performing linear regression. We instantiate the TF contrib. learn linear regressor class. The linear regressor estimator is used for building linear regression models. There’s no surprises there because it’s so closely tied to the regression model that we’re going to use. It knows what cost function to use and the right optimizer as well. It takes in just one argument and that is the feature columns, the type of features that we have to regress on and how many there are.

In our case, there is just one feature. The next step is to set up the input function. The input function is the bridge between the estimator and the features that we’re going to feed into the model. Here we use the NumPy input function which is a standard input function available in the TensorFlow library. It is used to feed a dictionary of NumPy arrays into the model. RX data or training data is available in the form of NumPy arrays. So this is perfect for us. The first argument to the NumPy input function is which features from our feature vector we want to pass in. In our case, we have exactly one feature that is X and that is what we specify in the form of a dictionary. The x feature is our training data set which is present in the variable x data.

The next argument to the input function is the labels for our y values that is present in y data. We can use this input function to specify the properties of our training, the batch size as well as the number of epochs in our training. Here we choose the bat size to be the entire data set size and NUM epoch is 10,000. Once the input function has been specified, we are ready to train our model. This we do by calling the estimator fit method. The input arguments to this method is the input function that we just set up and strangely enough, the number of steps that we want to use to run the regression. If you notice, the input function also allowed you to specify the number of epochs for which you wanted to run this training.

The steps and the number of epochs may seem to be redundant, but when you actually run this model you’ll find that the estimator uses the lower of these two values. This is what the linear regression estimator does. But it’s totally possible that there are other more complex models which requires this additional specification of steps. This is the step that runs the training on our model. And you’ll find that the estimator within it prints out a bunch of messages on screen so you can see how the model converges to its final value. In order to access the values of the variables within this estimator, you can call fit get variable names, you can iterate over all the variables that this estimator has and then print it out to screen using fit get variable value and passing in the variable name.

The output here prints out a bunch of stuff. Our final bias value is in linear forward slash bias weight, which is 0. 9, and our final weight value is linear forward x forward slash weight, which is 1. 67. If you remember our baseline implementation, these values are almost identical to those. There are a number of other variables that are printed out, and these have to do with the internal working of our estimator. For example, Ftrl should give you a clue that the optimizer this estimator uses is the Ftrl optimizer. Now that you’ve implemented linear regression with estimators, you know that it is the input function which acts as a bridge between the training data set, the training parameters such as NUM epochs, batch size, et cetera, and the estimator object. The TensorFlow library offers a number of builtin input functions. One amongst them is the NumPy input function, which allows you to pass in data in the form of NumPy arrays.

16. Lab: Logistic Regression using Estimators

When you use an estimator to perform logistic regression, why do you not need to specify your training labels representing categorical values in one hot notation? In this lecture, we’ll see how we can implement logistic regression with Estimators. Once you’ve done linear regression, though, logistic regression is very, very easy. There’s really just a little bit of tweaking that you need to do to get this working. The code for this implementation is in a Python notebook named Logistic Regression with Estimators. The code cells at the very beginning will have all the helper methods and the baseline implementation for logistic regression. Setting up the feature specification for logistic regression is identical to how we did it.

In linear regression, set up a real valued column, and the main crucial difference in logistic regression is the estimator that we use. Logistic regression is a way of classifying our input data into categorically values. That’s why we use a linear classifier. We do need to set up our training data set and its associated labels in a way that the estimator wants it to be. We’ve seen this code before, though. We use the NP expand dim’s command in order to set up our returns data in the form of a two dimensional array. The returns for our SNP 500 should be an array of arrays. We want to ignore the column for intercept from our x data, which is why we select just column zero.

Use NP expand dims on our y data as well to get a two dimensional array of our training labels. Our training labels in this case can just be true false values. It need not be in the one hot notation. That’s because the logistic regression estimator will take care of converting this to the one hot notation. Once again, saving you a lot of detailed work. The input function used by this regressor is exactly the same as before. We use the NumPy input function so that we specify our feed dictionaries in the form of NumPy arrays. It takes in the X feature vector, the y labels. We can specify the bat size as well as the number of epochs that we want the training to run. You can now train this logistic regression model by calling Estimator fit.

This takes in the input function that you just set up and redundantly the number of steps or epochs you want to run this training for. In case you happen to be tweaking your model parameters, you can set up a different input function as well. The input function One shot sets up another NumPy input function where the batch size includes your entire training data set, and the number of epochs for your training is exactly one you can call Estimator fit. In order to run your regression model on any of these input functions, I’ve just chosen this input function at random. Try it with the input function one shot and see how the result changes. If you remember our logistic regression implementation from earlier.

We had to jump through a lot of hoops to calculate the percentage accuracy that our model generated. We used NP, Arc Max, did a whole bunch of comparisons, and so on. Using estimators, though, all of this is abstracted away from you, you can simply call the fit evaluate function, specify the input function that you want to evaluate and store the results in results. I was playing around with this code for a little bit. At this point, I’m choosing to evaluate the input function one shot. But once again the choice of input function is up to you. You can run estimator fit with one input function at a reference to the fit object and call fit evaluate, specify the input function and it’ll tell you how good your training process was.

Print out both the results and the variables. The variables you know will print out the bias value and the weight of our regression parameters. Notice within the result that the accuracy of your logistic regression is printed out for you. Here the accuracy is zero point 69 59, 69%. At this point, you know that estimators abstract away a lot of the details of our regression model labels from us. An estimator can accept your training labels in raw form. We just pass in a bunch of true false values in the form of a two dimensional array under the hood. It will convert it to one hot notation and run the regression training.