Data science on Google Colab

Willem de Beijer and Daan Kolkman

This tutorial will take you through the steps of using Google Colab for data science. It is part of our Cloud Computing for Data Science series.

1. About Google Colab

Google Colaboraty is a service that allows you to run Jupyter Notebooks in the cloud for free. While it is more limited than a virtual machine, it’s much easier to set up and get going. Aditionally, you can use your existing Google account to login to the service. A good introduction to Colab can be found on https://colab.research.google.com/notebooks/welcome.ipynb#

2. Getting started

To get started, go to “File” in the top menu and choose either “New Python 3 notebook” or “Upload notebook…” to start with one of your existing notebooks. 

Getting data in Colab can be a bit of a hassle sometimes. Colab can be synchronized with Google Drive, but the connection is not always seamless. The easiest way to upload a dataset is to run the following in a notebook cell:

from google.colab import files
uploaded = files.upload()

This will prompt you to select and upload a file.

For other methods on how to upload data to Google Colab I would recommend the following blogpost: https://towardsdatascience.com/3-ways-to-load-csv-files-into-colab-7c14fcbdcb92

3. What you get

Packages

Most packages you will need for data science are pre-installed on Google Colab. This is especially true for Google-made packages such as TensorFlow. Recently, Google has introduced Swift for TensorFlow which allows you to use the Swift programming language with TensorFlow directly in a Colab notebook. As of writing the project is still in beta version, but it might be interesting to note for those who are interested.

Computing resources

Just like with Kaggle, Google Colab will provide you with free computing resources. Colab also offers TPU support, which is like a GPU but faster for deep learning. Keep in mind though that while TensorFlow does support TPU usage, PyTorch does not.

4. When to use

Collaboration

Google Colab can be especially useful to use for group projects since Colab notebooks can be easily shared on Google Drive. 

Personal

Just like with Kaggle, Google Colab can also be used to extend on the computing resources of your own device. Whether you want to use Google Colab or Kaggle ultimately comes down to personal preference, but for me Colab felt a bit like a pain and therefore I prefer Kaggle in this case.

For a good comparison between Google Colab and Kaggle I would suggest:
https://towardsdatascience.com/kaggle-vs-colab-faceoff-which-free-gpu-provider-is-tops-d4f0cd625029

Leave a comment



Neem contact op

  • Sint-Janssingel 92
    's-Hertogenbosch
  • info@jadsmkbdatalab.nl

Over ons

Het JADS MKB Datalab maakt data science bereikbaar voor iedereen. We voeren met studenten kortlopende projecten uit om organisaties te helpen waarde te halen uit hun data.

Copyright © 2017 All Rights Reserved.