Welcome to the ninth installment of the Learn to Build Better series. This multi-part series showcases the tools and techniques to rapidly build, test, and prototype energy-focused applications, analytics and use cases on the Awesense Energy Transition Platform, using the AI Data Engine to process data, and TGI or APIs to access and visualize the data structured according to the Awesense open Energy Data Model (EDM).
At Awesense, we love data science notebooks. As we wrote in our previous Build Better article, Jupyter notebooks have the spotlight. That is why our GitHub repository’s use cases, tutorials, and code snippets are coded in .ipynb notebooks.
When users want to use and explore these notebooks, it may not be an easy path for some. That’s because the infrastructure supporting Jupyter Notebooks and a number of different libraries or add-ons need to be set up and installed to make them work. This can even be almost impossible on corporate computers due to restrictions placed on users. For this reason, in this article, we will look at the possibilities of using web-based notebooks directly in a web browser without the need to install and set up on a local computer.
Web-Based Data Science Notebooks
In this article, we will compare three popular tools: Google Colab, Amazon Sagemaker, and Kaggle Notebooks. Of course, there are many similar tools, such as Visual Studio Code for the Web and Databricks Notebooks, but the aim of this article is not to compare them all, merely to give you a taste of what’s possible.
These tools represent a diverse ecosystem of web-based development environments tailored to meet the needs of modern programmers, data scientists, and researchers. They democratize access to computing resources and offer an environment for data science. Each platform facilitates collaboration and supports a broad array of programming languages and tools.
The choice between them in the context of running the Awesense GitHub notebooks with Energy Data Model (EDM) examples depends mainly on:
- Accessibility – Do users need dedicated accounts? Will you need registration?
- Cost – Is the tool free to use? Can users use it on unlimited or on a trial basis?
- Ease of use – Is an installation of libraries or packages needed to use Awesense notebooks, for example?
Google Colab
https://colab.research.google.com/
Google Colab is a free cloud-based data science notebook that lets you run code without installing any software. It’s particularly useful for machine learning, data science, and education because it provides access to powerful computing resources like GPUs and TPUs. You can think of it as a Jupyter Notebook environment that’s ready to use as soon as you sign in. Why we like Google Colab:
- Very simple import of already created Awesense notebooks.
- Upon launching the notebooks, everything works; no need to manually install any modules or packages.
- Users need a Google account, and everything is free.
Amazon Sagemaker
https://aws.amazon.com/sagemaker
Amazon SageMaker is a cloud-based machine learning platform offered by Amazon Web Services. It simplifies the process of building, training, and deploying machine learning models by offering a suite of tools all in one place. This allows developers to focus on the machine learning tasks themselves rather than managing the underlying infrastructure. This is our take on Amazon Sagemaker:
- An AWS account is required – it takes some time to set up, but after that, it is very easy to import already created notebooks.
- Installation of additional modules/libraries is required. Modules can be installed using the “pip” command directly in the notebook (!pip install package_name).
Kaggle Notebooks
Kaggle Notebooks is a cloud-based environment for working on data science and machine learning projects. Like Google Colab, it eliminates the need for local software setup and provides access to powerful computing resources. This makes it ideal for running code without worrying about hardware limitations. An additional benefit is an ability to share and collaborate on your notebooks with others easily. Here is what we think about Kaggle Notebooks:
- Very simple import of already created notebooks without registration. With registration, already created/modified notebooks are then stored for future use and can be downloaded, shared etc.
- Some modules need to be installed for proper functionality. Modules can be installed using the “pip” command directly in the notebook (!pip install package_name).
Step By Step Guide
Since Google Colab works without the need for additional settings, we will dedicate the following section to describing how to get started with this tool step by step.
Instructions for working with Google Colab and Awesense notebooks:
- You can download the Awesense notebook(s) from the Awesense GitHub repository. Ask us if you would like to get access to these notebooks.
- Open colab.research.google.com.
- Next, you need to sign in to your Google account.
- Then you can upload a notebook using “File – Upload notebook” or select “Upload” on the welcome screen. Drag the file (.ipynb) to the dedicated upload field. Then you can run the individual code cells using the “Play” button or the key combination “Shift+Enter”. Please note that some parts of the code may take several minutes to execute.
Sign in to Google account:
]
Upload and run the notebooks:
Important Note About Data Security and Privacy
Security and privacy considerations are paramount when utilizing cloud-based data science tools. While working with anonymized or synthetic Awesense data (sandbox/sandpit data) poses minimal risk, real client data (e.g. from your own Awesense instance populated with real client data) introduces significant concerns. Therefore, a thorough evaluation is necessary to determine the suitability of these web-based tools for projects involving sensitive information.
Summary
In this Build Better blog series post, we have summarized how to overcome potential obstacles when using selected tools to work with notebooks, especially notebooks created for our Sandbox/Sandpit environment. Google Colab worked best for us. There was no need to set up or install anything; all we had to do was upload the notebook from Github, and everything worked right away.
With these tools, it can be much easier to start working with notebooks almost immediately, making the first steps for data analysis very accessible.
Free For a Chat?
We would love to connect with our wide audience and for you to share our content! Follow along with this series, and let us know what ideas you would like to see us write about. Whether it’s more content about the topics we’ve already written on or even a specific use case or tool you would like to know more about, let us know. If you or your team are interested in building a custom application or use case using the Awesense Energy Transition Platform, or you have an analytical tool you would like us to demonstrate with our platform, please feel free to reach out at tools@awesense.com.