Using Jupyter Notebook - Developer Documentation
Skip to content

Using Jupyter Notebook

You can use different Insights Hub APIs such as Data Exchange, Model Management, and IoT Time Series from Jupyter Notebook, the model development workspace for Predictive Learning and PrL Essentials. Your tenant must have valid access to these APIs in order to utilize them in the workspace environment.

Jupyter Notebook can be accessed from the Manage Environments page and the following procedures will aid you in launching Jupyter Notebook.

Note

PrL Essentials does not retain the Jupyter Notebooks you create. Please save them to your local machine before stopping the environment.

Go here for in-depth information on Insights Hub APIs.

Checking your Configuration

When you begin working with your scripts, your environment will require a certain set of libraries.  

Some libraries required to run the minimal services within the cluster come preinstalled. We have included links to the libraries in the Open Source Software topic in this Help.

Run these commands to examine installed packages from Jupyter:

!pip freeze \--user

We recommend that you install the required packages at the beginning of the notebook and execute it each time the cluster is started. The custom actions you take are not stored between stops and starts.

Installing Your Own Python Libraries

Please use the following whenever you need to install your libraries:

#upgrade pip and install required libraries
%pip install --upgrade pip --user
%pip install requests --force-reinstall --upgrade --user
%pip install pandas --force-reinstall --upgrade --user
%pip install pyarrow --force-reinstall --upgrade --user

Install your own Python libraries by running these commands:

import os 
import requests
import json 
dlpath = '/datalake/v3/generateAccessToken'
gw = os.environ['GATEWAY_ENDPOINT'] + '/gateway/'
# increment_value = 1
headers = {
    'Content-Type': 'application/json'
}
payload="{ \"subtenantId\":\"\" } "
dl_url = gw + dlpath
response = requests.post(dl_url, data=payload, headers=headers)
#print(response.status_code)
dl = json.loads(response.text)
os.environ["AWS_ACCESS_KEY_ID"] = dl['credentials']['accessKeyId']
os.environ["AWS_SECRET_ACCESS_KEY"] = dl['credentials']['secretAccessKey']
os.environ["AWS_SESSION_TOKEN"] = dl['credentials']['sessionToken']

Not all external repositories are allowed. If you require additional external sources for your project, please contact your organization's Predictive Learning Essentials administrator.

When an instance stops, all libraries and modifications performed on the instance are lost, which requires users to run the installation paragraphs each time Jupyter imports the note. Running the installation paragraphs insures that your machine is up-to-date.

Using Inputs from Job Execution

All job executions require parameters, and you can use any of the following:

  • Data Exchange
  • IoT
  • Integrated Data Lake
  • Predictive Learning Storage (PrL Storage)

Using the first three from the list above ensures Job Manager copies the input to a temporary location available to your code. In Jupyter notebooks, there are three variables available:

  • inputFolder
  • outputFolder
  • datasetName

You can employ the Jupyter magic command %store, as follows:

%store -r inputFolder #-r specifies a read %store -r outputFolder %store -r datasetName  

The datasetName variable will only contain a value when the IoT input type is being used. The inputFolder will be prefilled by the job execution engine with a value pointing to the temporary location that holds the input files or data. That will be an S3 path on AWS or a blob storage on Azure. It does not contain the associated prefix like s3://. You can then use the outputFolder variable in a Jupyter notebook as in:

!aws s3 cp ./mylocalfile.txt s3://\$outputFolder+\'/myfile.txt\'

For Jupyter notebooks we do not provide a built-in library for loading the dataset, however, there are various ways to achieve this by using Python. If you encounter any issues in loading your dataset, feel free to contact us for guidance.

Note

Both inputFolder and outputFolder variables are remote storage paths, not local folders; therefore most of the commonly-used file functions do not work against it; however, CLI and shell commands will use them as long as they use the correct prefix. For Python or Scala libraries that can work with remote storage services, we recommend checking the documentation for each respective library; for example, the pandas Python library is able to save and read files from AWS S3 storage.

More About Jupyter Notebook

Jupyter is a powerful tool that allows multiple customizations and languages. These resources can help you explore further:


Except where otherwise noted, content on this site is licensed under the Development License Agreement.


Last update: August 3, 2023