Google Cloud Virtual machine tutorial

Willem de Beijer and Daan Kolkman

This tutorial will take you through the steps for creating a data science virtual machine on Google Cloud. It is part of our Cloud Computing for Data Science series.

1. Creating a Google Cloud account

Regular

A Google Cloud account can be created on https://cloud.google.com/free/and is linked to a regular Google account, such as Gmail. You can create a free account which comes with $300 of credit. Though you have to provide credit card details, you can’t be charged until you manually upgrade to a paid account.

Student

While Google Cloud does have a student program, it is meant for teaching purposes and research projects only. You can have a look at the program at https://edu.google.com/programs/students/?modal_active=none. As of writing, there are no ways to use the free credits for your own projects unless it is a specified research project. Therefore, it is recommended to stick with a regular account for now. 

Once you’ve signed up, you should be redirected to the console as shown below.

2. Setting up the VM

Data science containers

Google has made data science images for VMs, which will make our lives as data scientists a lot easier. However, at the time of writing this project is still in beta version and does not work to satisfaction yet. If you read this article at a later time it might be useful to use one of these containers, but for now we’ll stick with a plain vanilla setup. 

Plain vanilla setup

Click the “Compute Engine” pane in the “Top Products” section, or search for it in the top bar. Click the “Create” button to create a new VM.

Fortunately, the setup is very easy. The pricing and options for different machine sizes are very transparent with Google Cloud platform. For this tutorial we will stick with the standard “n1-standard-1”.

Under the Firewall sections make sure to check both “Allow HTTP traffic” and “Allow HTTPS traffic”.

Creating an SSH key

Mac/Linux Windows

To make an easy connection to our VM we will set up an SSH key. Open Terminal and execute the following command:

ssh-keygen -t rsa

When asked where you want to save the SSH key you can just press ENTER to leave it at the default location. Since I have multiple SSH keys already, I chose a different filename:

/Users/willemdebeijer/.ssh/id_google_cloud

The result should look like this

To get the public key the VM requires, execute:

cat ~/.ssh/id_google_cloud.pub

Note: “id_google_cloud” is the name I used but if you chose a different one, then use that.

The result should be a long string of random characters starting with “ssh-rsa” and ending with your user name. Copy this whole key including “ssh-rsa” and your username at the end.

Install Putty

The easiest way to manage SSH connections on Windows is by a tool called Putty. This tool can be downloaded at https://www.chiark.greenend.org.uk/~sgtatham/putty/. Once you’ve finished installing, open the app called “Puttygen”. 

Make sure that in the “Parameters” box, an RSA key with 2048 bits is selected. Then click “Generate”. Now change the “Key comment” to something recognizable and easy to type (this will be your username later!). If you want some additional safety, enter a passphrase in the corresponding fields. Proceed to save both the public and private keys.

Now copy the entire public key in the text field on top.

Back to the VM setup

Now go back to the Google Cloud Virtual Machine setup page. Expand the “Management, security, disks, networking, sole tenancy” section and go to “Security”. In the large textfield under SSH keys enter the key that you just copied.

Now that the setup is done, click the blue “Create” button on the bottom of the page. It might take a few minutes for your VM to initialize.

3. Using your Virtual Machine

Mac/Linux Windows

Now that our VM is ready to go, let’s use it to run a Jupyter Notebook. Copy the “External IP” shown on the Google Cloud VM console.

Now open up a Terminal window on your local machine and execute:

ssh -i ~/.ssh/id_google_cloud <external-ip-here> 

Now that our VM is ready to go, let’s use it to run a Jupyter Notebook. Copy the “External IP” shown on the Google Cloud VM console.

Now open the Putty app and paste the IP address to the Host Name (or IP Address) field:

In the left pane under go to “Connection” -> “SSH” -> “Auth” and click “Browse”. Now select the private key file and click “Open”. Now click “Open” on the bottom of the Putty window to start the SSH connection. A confirmation prompt will show to confirm you trust this instance, click “Yes”.

The resulting terminal will show “login as”. Enter the username that we created when creating our public and private key. When logged in to your instance the resulting terminal should look similar to this:

Setting up Anaconda

Our VM is running on Debian OS which is the default for a Google VM. With this software, we need to download a small tool to be able to install Anaconda. In the Terminal/Putty window in which we connected to the VM, run the following commands one by one:

sudo apt-get update

And then:

sudo apt-get install bzip2

Now download Anaconda with:

wget https://repo.anaconda.com/archive/Anaconda3-2019.03-Linux-x86_64.sh

If you’re using this tutorial a long time after it was written, you might want to get the latest Anaconda download link (but remember to get the Linux version!).

Install Anaconda with:

bash Anaconda3-2019.03-Linux-x86_64.sh

When shown the terms of agreement, hold down the ENTER key to scroll down. Continue to install in the default location.

At the end of the installation you will be asked if you want to prepend your Anaconda installation to the .bashrc PATH. When this shows up type “yes”.

If you accidentally just pressed enter without typing “yes”, correct with the following steps. If not, continue with the non-italic section below.

vim .bashrc

Your prompt should be change to show vim mode. Press the “i”-key to be able to type and add the following to the bottom of the file:

export PATH="/home/ubuntu/anaconda3/bin:$PATH"

Hit the ESC key to get out of edit mode, then type “:wq” and press ENTER to go back to the regular EC2 command line.

Now execute the following command to set Python 3 as the default:

source .bashrc

Check what version of Python the system is running with the following command:

python

(Should be version 3.x.x)

4. Access and running Jupyter Notebooks

To make our lives easier in the future, we will set a static IP address for the instance that we can always use to connect. In the VM console got to network details as shown below:

In the left pane click “External IP addresses”. Change the type of the existing IP address from “Ephemeral” to “Static”. 

In the resulting prompt, enter a name and click “Reserve”.

SSH configuration

Mac/Linux Windows

To set up an easy-to-use SSH configuration open a new Terminal window and execute:

vim .ssh/config

This will open an SSH configuration file in which we can enter our configuration details. Press the “i” key to start typing and paste the following text:

Host google-cloud
   Hostname 34.90.159.202
   User willemdebeijer
   IdentityFile ~/.ssh/id_google_cloud

Make sure the IP address after “Hostname” matches the public IP address of your VM instance, the “User” is set to your username in Terminal and that the filename behind “IdentityFile” matches the name and location of your .pub file. The “Host” name on top can be any arbitrary name you would like to use for this SSH connection.

Again press ESC to stop editing and “:wq” to save and exit vim mode.

Now to SSH into the instance all we need to do is open a Terminal and execute:

ssh google-cloud

Where “google-cloud” should be replaced with whatever Host you used in the VIM setup.

You might get the following error in the Terminal if your private key can still be accessed by other users on your device:

To solve this simply execute:

chmod 400 ~/.ssh/id_google_cloud.pub

Running a Jupyter notebook

Every time you want to run a notebook on your VM, SSH into your instance like we just did. Then in the Terminal window with SSH connection execute the regular:

jupyter notebook --no-browser

The terminal will respond with a token:

Now open a new Terminal window and execute:

ssh -NfL 9999:localhost:8888 google-cloud 

(Replace “google-cloud” with your host)

This will tie the port on our VM to our local port.

To set up an easy-to-use SSH configuration open a new Putty window.

Again, enter the IP address and private key just like we did before. 

In the left pane go to “Connection” -> “SSH” -> “Tunnel”. In the source port textfield enter “9999” and in the destination port enter “localhost:8888”. Now click “Add”.

Now in the left pane go to “Session”. In the textfield under “Saved Sessions” enter a session name click “Save” on the right. The next time we want to connect to our instance, we can simply load this session.

Running a Jupyter notebook

Every time you want to run a notebook on your VM, SSH into your instance with Putty with the configuration that we just made. Then in the terminal window with SSH connection execute the regular:

jupyter notebook --no-browser

The terminal will respond with a token:

Now go to your browser and navigate to: localhost:9999

You will be requested a “Password or token”. Enter the token that was shown by the Terminal in which we started the Jupyter Notebook.

Congratulations, you are now running Jupyter Notebooks on your very own virtual machine! Please remember to shut your VM down after you’re done, since you will be billed for every minute it’s active!

Leave a comment



Tom

2 months ago

How to utilize the processing power of several cores in Google Cloud VM?

I have been working with my VM for quite a while now and it can be rather fast, however, I cant seem to get it to utilize the amount of cores that are available in the VM. I have configured it to have 8 cores. When I run the ‘lscpu’ (cpu-info) command it does show the configured cores but when I run a job through my jupyter notebook, e.g. a scikit-learn algo with the parameter N_JOBS=-1, it takes the same amount of time as when I run it without the N_JOBS parameter. In the overview for the VM, it shows that the CPU usage remains very low.

When I do this on my own machine, computing time is significantly lowered and I notice that I can’t really use my PC for other things during the computing since it is using all my cores.

I am however, able to create a kubernetes engine with a couple of VM’s running from the terminal; in this case I can manage to utilise more cores.

I presume that this problem arises because of some configuration issue.

By the way, i’m SSHing into my VM through PuTTy on Windows.

Tom

Willem de Beijer

2 months ago

Hi Tom,

That sounds quite odd, I haven’t seen this particular problem. You could try to use a tool like htop for the terminal to specifically see what processes are using cpu and how much each core is utilised.

Are you running the exact same algorithm on your local machine and VM, but not seeing a speed increase? It might be possible that the particular algorithm your executing just doesn’t benefit from multi-threading that much. I’ve personally not had any problems where the available compute from the vm wasn’t utilised when possible.

Hope this helps

Best regards,
Willem

Neem contact op

  • Sint-Janssingel 92
    's-Hertogenbosch
  • info@jadsmkbdatalab.nl

Over ons

Het JADS MKB Datalab maakt data science bereikbaar voor iedereen. We voeren met studenten kortlopende projecten uit om organisaties te helpen waarde te halen uit hun data.

Copyright © 2017 All Rights Reserved.