Professional Data Engineer: Professional Data Engineer on Google Cloud Platform Certification Video Training Course Outline
You, This Course and Us
Cloud SQL, Cloud Spanner ~ OLTP ...
BigTable ~ HBase = Columnar Store
Datastore ~ Document Database
BigQuery ~ Hive ~ OLAP
Dataflow ~ Apache Beam
Dataproc ~ Managed Hadoop
Pub/Sub for Streaming
Datalab ~ Jupyter
TensorFlow and Machine Learning
Regression in TensorFlow
Vision, Translate, NLP and Speec...
Virtual Machines and Images
VPCs and Interconnecting Networks
Managed Instance Groups and Load...
Ops and Security
Appendix: Hadoop Ecosystem
You, This Course and Us
Professional Data Engineer: Professional Data Engineer on Google Cloud Platform Certification Video Training Course Info
Gain in-depth knowledge for passing your exam with Exam-Labs Professional Data Engineer: Professional Data Engineer on Google Cloud Platform certification video training course. The most trusted and reliable name for studying and passing with VCE files which include Google Professional Data Engineer practice test questions and answers, study guide and exam practice test questions. Unlike any other Professional Data Engineer: Professional Data Engineer on Google Cloud Platform video training course for your certification exam.
3. Lab: Creating a VM Instance
In this video, you'll think about what factors you need to take into account when you instantiate a virtual machine instance. On Google's Compute Engine, we'll set up an Accessa Compute Engine virtual machine using the web console. We start off at the project dashboard and navigate to Compute Engine using our navigation sidebar. Choose VM instances and you'll get to the page where you can create and manage one. I have no instances instantiated at this point in time. That's why I do not see a list of instances here. I simply see an option to create a new instance. Click on that, and here is a form that I need to fill out and give details of the kind of instance that I want. The first thing is the name of this instance. I'm going to call it my test instance. There are some rules that this name needs to follow. Don't worry, this web interface will help you with those rules. I then need to make a choice as to what zone I want my instance to live in. There are a number of zones available spread across the US. Europe and Asia. Australia is available as well. Each zone has a different price per month, so make sure you take that into account as well. Some zones are more expensive than others, probably based on resource availability. There are a number of factors that go into figuring out what zone is right for you. Where do you expect your traffic to come from? You don't want traffic from Asia to go all the way to the US. So maybe you want your instance to be located in Asia. Or if you're a government organization, you might have some constraints as to where your data can be located. That also figures into where you would situate your VM instance. in the US. You can choose between the country's west and east. Notice how the cost per month changes. The estimated bill that you'll receive per month changes based on your zone. Australia, as you can see, is pretty expensive. The next choice that you need to make is the kind of machine you need for this Compute instance. If you're not going to run something very computationally heavy, you can choose a micro instance, which has one shared virtual CPU and very little memory. It's also not that expensive. Play around with the choices and see what is available to you, then choose the one that is correct for the application that you plan to host. I'm going to choose a pretty basic instance with one dedicated virtual CPU and 3.75 gigs of memory. As you can see in the details on the right, it has a 10 GB standard persistent disk. You can then choose how to restrict your instance. My instance has full access to all cloud APIs, and I want my instance to be able to serve HTTP traffic. So that's the choice that I make here. There are advanced options that you can choose to set up. Notice the link that says management discs, networking, SSH, keys, etc. We can ignore those for now and then just go ahead and create this instance. Creating the instance might take a while, but once it's complete, you can SSH directly into the instance in order to set up what software you want on this compute engine. This can be done directly via the browser. Just click on SSH, and you can choose to open it within a browser window. You can also use your cloud shell and the Gcloud command within it to SSH to this instance, or you can use the terminal window from your local machine, which is another SSH client. I'm going to choose Open in a browser window, and if you have pop-up blocking enabled on your machine, you'll find that you'll get pop-ups blocked. Go ahead and unblock this and allow it to open in a new window. This will open up a new browser window with an SSH connection to our VM instance that we just set up. You can tell it's our VM instance because you can see that the prompt says my test instance. What you now have is a Linux machine in the cloud somewhere in Google's data center. This Linux machine is kind of cool because it has a bunch of software packages already preinstalled here.For example, Python 2.7 is readily available on this VM instance I want to update all the packages that I have installed on this instance; I can run "pseudo app get update." Let's say you wanted to install git because you wanted to perform some code commits that aren't installed by default. You can simply install the git package, as you can see on screen. Once the install is complete, simply run git version to see what version you have on your machine. Because Python is already installed, the sudo app get python command returns no results. You can check the version, and I have Python 2.7.9. Python Three is also installed here by default. It's easy enough to verify this as well. The app is updated, but nothing happens, and Python Threeversion reports that the version is 3.4.2. This lecture should have given you an idea of the thought that goes into determining what kind of VM instance you need and where it should be located. The VM hardware—the CPU, the RAM memory, and its hard disk—should support the kind of computation that you want to do. It should be located in a region where your data is allowed to live, and it should be close to where your traffic comes from. These are some of the factors that you will take into consideration.
4. More GCE
Here's a question that I'd like you to try and answer, or at least keep this question in the back of your mind as you go through the remainder of this video. If you do decide to use preemptible VM instances in your Google container engine clusters, what precautions should you take? If you don't know what a preemptible VM instance is, Well, here's what it is. It is a type of VM instance that the Google Cloud Platform can take back from you at any point with just 30 seconds' notice. Before we move on to discussing containers in all their glory, let's understand some of the important design choices that you are responsible for. If you're using the Compute engine, the first choice that you have to make is that of operating system. Here are the public images available for Linux and Windows Server. These come from Google. In addition, if you decide that you require some exotic operating system, you can use private images that you create or that you import into Compute Engine. There is a pretty rich set of operating system images available for your use. When you create a virtual machine instance, you have a lot of power. You have full root privileges as well as shaping. This is a capability that can be shared with other users. Remember that you will be charged for each virtual machine instance, which is something to consider carefully as we will discuss later when discussing security. and so you've got to specify all of the billing information. You've got to specify the zone, the operating system, which we already discussed, as well as the machine type. There is a rather complicated set of machine-type decisions that you have to make as well. More details on this are available on the GCP Docs. But the basic idea is that there are standard machine types, and then there are machine types optimised for high memory usage or high CPU usage, and then there is also something known as shared core machine types. These are small and are used for jobs that do not require a lot of resources. These days, with the advent of machine learning, GPUs are becoming increasingly important. Machine type choices include the ability to attach GPU dies to most of these types. There's some fine print around this that isn't really important for us here. We should also know a little bit about the networking that goes on inside each project. Remember that each virtual machine instance has to belong to a project; that's by definition, because of course you've got to pay for each VM, and everything that you pay for has to belong to some project or the other.A project can have any number of instances. You'll never be discouraged from adding virtual machine instances to your project. And projects can have up to five virtual private clouds. These are internal network separations inside the project. Each instance of a virtual machine is going to belong to exactly one of these VPCs. We will study VPCs in a lot more detail. But for now, just keep in mind that instances within a particular VPC are going to communicate using a LAN. Instances in different VPCs are going to have to resort to the public Internet or a VPN. This is also a great place for us to discuss something known as preemptible instances. A preemptible instance is a VM instance type, which is much cheaper than the regular ComputeEngine machine types that we just discussed. Of course, the reason for this is that a preemptible instance may be terminated, that is, preempted, at any time if Google Compute Engine requires the resources held by this VM. Preemptible instances cost only a fraction as much as other VM instances. And so if you have a fault-tolerant application—for instance, if you have a processing-only node in a Hadoop cluster—a preemptible instance might make a lot of sense, particularly if you are rather budget conscious.Now, if you do decide to make use of preemptable instances, be sure that you are aware of the fine print. Preemptible instances are definitely going to be terminated after running for 24 hours. So don't even think about using a preemptable instance for a really long-running job. If you're using your preemptible instance for relatively short jobs, well, then the probability of termination is typically quite low. This probability of termination will vary based on the day, the zone network conditions, and other stuff. What other stuff? Migrations and maintenance are also possibilities. Preemptable instances, unlike other VM types, cannot migrate. That is, they cannot stay alive during software updates, and they will be forcibly restarted during maintenance. Next, let's understand exactly how the preemption process works. The first step is for the GoogleCloud platform or Compute Engine to send a soft off signal to your instance. Here, your machine has 30 seconds in which to perform cleanup via a shutdown script and give up control. If it does not do so, Compute Engine will forcibly take control by sending a mechanical off signal. And so, this is really important. If you are going to use preemptible instances, make sure that you have a well-written shutdown script and that you have associated that shutdown script with this instance using the console or whatever other administrative mechanism you are making use of. Once those 30 seconds are over and the mechanical lock comes in, that's it. Your virtual machine instance is going to be sent into terminated mode. Another important choice that you've got to make has to do with the storage options that you would like associated with your VM. We are going to talk about these storage options in a great deal of detail, just a little bit in the next module, in fact. But let's really quickly understand them. Now, clearly, every virtual machine instance is going to come with a small disk. This has to contain the operating system. After all, this is a root disk. This is persistent; additionally, you will be able to select additional storage options from a menu that contains four items: persistent disks, which could be either standard or SSD local SSD discs that are not persistent and are attached to your machine instance; persistent discs that are not persistent and are attached to your machine instance; persistent discs that are not persistent and are attached to your machine instance; persistent discs that are not persistent And lastly, to the cloud storage.We'll discuss each of these four in quite a bit of detail. Do keep in mind that assistant disks, both standard and SSD, are abstractions under the hood. These are going to be wired up to some kind of redundancy mechanism—striping and so on. They are persistent and redundant. Local SSD disks, which are actually attached to the instance, are not redundant, but they are extremely fast. In terms of cost, cloud storage is the cheapest. This is blob storage? Almost. Standard persistent discs are the next cheapest option. SSD disks, whether local or persistent, are going to be the most expensive type of storage that you can adopt. As you can see from this conversation, when you commission a virtual machine instance using Google Compute Engine, you are in charge of every detail, including the machine type, operating system, storage options, and so on. Compute Engine will make use of Stackdriver, which is a suite of GCP tools for logging and monitoring. In summary, with the Compute Engine option, which is an IaaS (infrastructure as a service) option, you are taking control of the environment that you want your code to run in and are responsible for making it work just right. As a result, your web application has suddenly become quite complicated. This is orders of magnitude more technically involved than just using Google Cloud Storage with or without Firebase hosting. Let's turn our attention back to the question with which we began this video. As we've seen, preemptible VM instances can be an economical option, and it makes sense to use them in certain use cases. But if you do decide to use preemptible VMs, be careful that you specify whatever cleanup operations you'd like in the shutdown script. Remember that a VM instance can be preempted by the Google Cloud Platform at 30 seconds' notice, and in that case, it's the contents of the shutdown script that are going to be executed. So put all of your clean-up operations in that shutdown script and ensure that it finishes successfully and gracefully inside 30 seconds.
5. Lab: Editing a VM Instance
Your VM instance is just a machine, and Google might periodically take it down for maintenance. What options do you have then, and how can you configure these? You'll learn this in this video. In this clip, we'll see how you can edit the settings of the virtual machine that you just set up. You are in the Compute Engine page, and there you can see the test instance that you just set up. If you click through, you'll notice that there has been a spike in activity, as seen in the graph. You SSH'd to that machine and installed a git that showed up in our API calls. This page displays the configuration parameters that you set up for this machine: the machine type, the hardware that you have, the SSD that you've connected to this instance, and so on. There are some other interesting details here as well. We want to delete the boot disc when the instance is deleted. We don't want to keep the reboot disc around. This is the default. You can also see that the virtual machine instance that we created is not preemptible. That means it cannot be arbitrarily shut down by Google while your job is running in order to gather the resources that it needs for other purposes. This is your dedicated hardware, and it is for you, and you'll be billed accordingly. Preemptible instances have a lower cost than dedicated hardware, but they definitely shut down at least once every 24 hours and can shut down at any point in time. The automatic restart setting is enabled by default because if your machine happens to shut down, you want it to restart automatically. You want your jobs to keep running. If your machine is taken down for maintenance by Google, then you have two choices. You can choose to migrate this VM instance to another machine so that your processes can continue uninterrupted, or you can shut down altogether. Migration to another VM instance is what is recommended for uninterrupted processing of your programs. This VM instance can be accessed using this service account. This is a common account for everyone on this project, and it is a robot account that allows the processes running on this instance to access other resources in the project. You'll study service accounts in much more detail later on in this course. You can choose to edit your virtual machine settings by clicking on Edit right here on this page. However, if you want to change the kind of machine that your VM instance runs on, you'll need to stop this instance, change the machine type, and then restart it later. If the scope of the programme that you're running on this instance has expanded and it needs more resources, you'll have to stop the instance and then configure your machine type. Notice that stopping the VM instance can have unintended side effects. You should be prepared to accept those. Hit Edit on the settings page and let's now configure your VM instance. If you go to Machine type, you can choose from the standard instances available in the drop-down list, or you can customise it even further. If you click on Customize, you'll see sliders that allow you to configure the number of cores on your machine, the amount of memory you can extend further, and so on. I'll choose four cores and 8.75 gig of memory. That's sufficient for my needs, let's assume. Notice that you can't change the zone once this instance has been created. If the original zone was Australia, it will continue to be Australia. You can also add more hard discs to your instance. When you go to additional disks, this requires that you have set up additional hard discs earlier. We haven't done so. That's why you can't see any additional discs there. You can create a disc using the link right there, though. Instead of allowing your instance access to all cloud APIs, you can also set these permissions at a more granular level by specifying set access for each API. It will now give you every API that has been enabled, and you can individually configure permissions for these. If you choose not to make any changes, you can simply cancel out of this Settings dialogue and then go ahead and delete your instance. As soon as you're done with an instance, it's better to delete it so it doesn't take up resources and add to your iPad. So you now know if Google takes down your machine for maintenance, and you can enable the setting that Migrate my instance when it goes down for maintenance." That is the recommended setting.
6. Lab: Creating a VM Instance Using The Command Line
Let's say you're in charge of administering the Google Cloud for some organization, and you want to repeatedly spin up some specialised VM instances with a lot of special cases in terms of CPUs, hardware, and so on. How would you do this? In this lecture, we'll create and edit VM instances using the command line. Let's create a brand new VM instance. We deleted the one that we had created previously, and let's look at some of the advanced configurations that we can set up. The link that we hadn't clicked before within here has something called "Labels." You may have some logical grouping of resources within your Cloud Platform and want to see the billing associated with those specific resources. By labelling those resources that you're interested in, you can see usage patterns, billing details, and a whole bunch of other information for the labelled group. Let's say, using this UI, you've set up the configuration for this virtual machine the way you want it to be. At the very bottom, you can find the equivalent REST or command-line command to set up the VMs with exactly the settings that you've specified. If you click on Command Line, it will pop up a dialogue that shows you the G Cloud command that you have to run to set up the VM instance with the settings that you've specified. your custom settings. This command is what you can use within scripts in order to create multiple VM instances with the same configuration. We'd done it earlier using the Web Console. In this example, let's create an instance using Cloud Shell. We'll start off with some simple GCloud commands for it first. The command for this is as you can see on screeng Cloud Compute instances Create creates a new VM instance. The name of that instance is another instance, and we want it to be in the US Central Zone A zone.Once this instance has been created, you can see the resultant status on Cloud Shell as well. You can also refresh your browser window and see the instance on your Web Console. Running help on this G Cloud Compute instance and the create command should show you all the options that you have available. This is great for quick lookup, but if you want to really customise your instance and have a whole bunch of parameters, it's better to specify it on the Web Console, generate the corresponding command line, and then use that. You can also set up some default values in your configuration file so that any new instances that you create will use these default values. For example, if you set the default zone to US Central Zone C, all new instances will be created in this zone. Try this out with another instance, do not specify the zone explicitly, and you'll find that this instance will be created in the US. central, one C. If you refresh your Web Console, you'll see that another instance is here too. central, one C. No zone was explicitly specified on the command line. It simply picked up the default from the configuration. Now, let's say I wanted to SSH into the first instance that we set up, which is called another instance. If you notice that my SSH command has failed, that's because this instance was not found in the current zone that is set to be the default, which is US. Central, one C. Another instance is in a different zone. So you need to explicitly specify the zone parameter when you want to SSH into it. But if you want to SSH into another instance too, which is in the default zone specified in the configuration, you do not need to specify the additional zone parameter. The zone is automatically picked up from your configuration file, and this SSH command will succeed. Now that I've got all these instances set up, I'm going to go ahead and delete them. We don't need them anymore. They were by way of an example to show you how instance creation can be carried out using the command line. Save your compute resources and delete instances when you no longer need them. If you want to create a number of virtual machines with specialised configuration in an automated way, you'll script it using the Gcloud Command Line utility. You can get the parameters for this G Cloud command by simply setting up the configuration on your web console and clicking on Command Line at the very bottom. Yeah.
7. Lab: Creating And Attaching A Persistent Disk
In this lecture, we'll answer the question: What is the necessary condition under which a persistent disc has to be created before it can be attached to a VM instance? We see how we can use the command line to create a new persistent disc of a certain size and then attach it to a VM instance that we've already created. Let's say we've already created a VM instance called the Test underscore instance. It can be found in the US zone Central One S. We now want to extend the size of the hard disc that is available to this VM instance, which we will do by creating a new persistent disk. We'll use the command line for this Gcloud Compute disk. The command test is create. Disk is the name of our disk. Size is 100 GB and it's in US. Central One F. The persistent disc that you set up has to be in the same zone as your instance. You can't create a persistent disc in another zone and attach it to an instance that lives in a completely different zone. Persistent discs have to be connected to the instance with high-speed connections, which is only possible in the same zone. Ideally, when you create persistent disks, they should be at least 200 GB in size for optimal read and write performance. Here, I'm creating a test disc as an example to show you how it's done, which is why I've chosen 100 GB, and I won't really worry about it. Once the status of your discretion is ready, the disc is ready to be attached. However, it can't be used unless you format this disk. For instructions on how to format this disk, you can simply follow the link that you see here on screen. We won't really worry about that right now. The formatting operation is pretty straightforward. Click through to your test instance VM, edit your settings, and let's go ahead and attach this particular disc to your VM instance. Just click on Add Item for additional disks, andin the name dropdown you'll find Test Disk, thestandard persistent disc that we set up. Things on the Web console are easy but tedious. What we really want to do is to use the command line to attach this disc to our instance. We need the command line, especially if this is a repeated operation. Use the G Cloud Compute instances, run the attach disc command, specify the instance and the disc you want attached to that instance, and the zone where they both live. That's pretty straightforward. Once the command has returned, hit refresh on your Web console, and let's go into the test instance to see whether the disc has indeed been attached right there on additional disks. Notice that the test disc is now attached to this VM instance. You can also SSH into your test instance and confirm that this disc is indeed part of that instance. Now use the Cloud Shell to do your SSH, and notice that your prompt changes my prompt is now journeyat test instance indicating that I'm now SSHed in. You can view all the hard discs that are associated with this instance by running LSL. Devisk byib: this will give you the list of hard discs attached. Notice the one at the very bottom is Kazi, called Google persistent disc 1. That's the new disc that we've just attached. In this lecture, we learned that if you are attaching a persistent disc to your VM, they have to be in the same zone. Your VM instance and the persistent disc cannot be in different zones.
Pay a fraction of the cost to study with Exam-Labs Professional Data Engineer: Professional Data Engineer on Google Cloud Platform certification video training course. Passing the certification exams have never been easier. With the complete self-paced exam prep solution including Professional Data Engineer: Professional Data Engineer on Google Cloud Platform certification video training course, practice test questions and answers, exam practice test questions and study guide, you have nothing to worry about for your next certification exam.