Skip to content

Making API Calls from Zeppelin Notebook

The model development workspace in Predictive Learning is Zeppelin Notebook, which allows use of Public APIs, such as Data Exchange, Model Management, and IoT Time Series.

You can use APIs that your organization has access to in the Predictive Learning workspace. Access Zeppelin Notebook from these pages:

  • Manage Environments
  • Manage Analytics Workspaces

Zeppelin Notebook includes both Python 2.7 and Python 3.6 versions.

Working with Zeppelin Notebook

This page covers the processes for using Zeppelin Notebook, including:

  • Checking the Packages Installed in Zeppelin
  • Best Practices
  • Installing Your own Python Libraries
  • Installing Your own R Libraries
  • Updating Python Interpreters
  • Changing the Python Interpreter version
  • Calling Public APIs from Zeppelin Notebook
  • Copying Data from Integrated Data Lake (IDL)

Go here for in-depth information on Public APIs.

Checking Your Configuration

When you start working with scripts, the environment requires a certain set of libraries. Some libraries required to run minimal services within the cluster come preinstalled and are available here

Use these commands to examine installed packages from Zeppelin:

%sh
pip freeze --user

Best Practices for Zeppelin Notebook Performance

To optimize Zeppelin notebook performance and, because the custom actions you perform on the cluster are not stored between restarts, we recommend that you:

  • Install the required packages as the first step in using the notebook.
  • Execute the packages each time you start a cluster.

Installing Your own Python Libraries

Users are free to install any libraries are required using:

%python

import os 

import requests

import json

dlpath = '/datalake/v3/generateAccessToken'

gw = os.environ['GATEWAY_ENDPOINT'] + '/gateway/'

# increment_value = 1

headers = {
'Content-Type': 'application/json'
}
payload="{ \"subtenantId\":\"\" } "
dl_url = gw + dlpath
response = requests.post(dl_url, data=payload, headers=headers)
#print(response.status_code)
dl = json.loads(response.text)
os.environ["AWS_ACCESS_KEY_ID"] = dl['credentials']['accessKeyId']
os.environ["AWS_SECRET_ACCESS_KEY"] = dl['credentials']['secretAccessKey']
os.environ["AWS_SESSION_TOKEN"] = dl['credentials']['sessionToken']

If you require additional external sources for your project, please contact your organization's PrL administrator.

Once the instance is stopped, all libraries and modifications performed on the instance will be lost. You will need to run the installation paragraphs every time the note is imported into Zeppelin, to make sure everything is up to date on the machine when you start working. If the newly installed/updated packages are not in the list as they should, the Python interpreter should be restarted.

Installing R Libraries

Installing R packages is done in a %spark.r paragraph using R commands. The package will be installed with all the dependencies.

%spark.r
install.packages('ggvis', repos='https://ftp.fau.de/cran/')
libraries <- as.data.fram(installed.packages()[,c(1,3:4)])`
libraries

%spark.4
library(ggvis)
head(mtcars)

%spark.r
remove.packages('ggvis')

Make sure to select a proper mirror in order to minimize the time required to install the package. All the mirrors can be found here.

Updating the Python Interpreters

A colored circle appears at the top of the Zeppelin Notebook screen that indicates the connectivity status to the server or interpreter. Green indicates connectivity. Red indicates the:

  • Session is expired
  • Cluster is stopped
  • Interpreter is not responding

Click to the right of the circle to view a drop-down menu for customizing Zeppelin options, as shown here:

Zeppelin options

Changing the Interpreter to Python 3 does not affect the Python version used by the %pyspark Interpreter. The default python interpreter version used by %pyspark is Python 2 and, to change that setting, you must change the spark's zeppelin.pyspark.python setting from 'python' to 'python3'.

How to Change the Interpreter to Python 3

Predictive Learning provisions the cluster with both Python 2 and 3 Interpreter variants, however, it is required that you update the command line that executes the paragraph with the interpreter's settings. Python 2 is the default interpreter setting.

Follow these steps to change Python Interpreter to Python 3:

  1. Navigate to Zeppelin Notebook.
  2. Select Interpreter from the About Zeppelin drop-down list. The Interpreters page opens.
  3. Enter "Python" in the search field and click Edit. The Edit section opens.
  4. Enter Python3 in the zeppelin.python field and click Save. A message asks you to confirm the change.
  5. Click OK. The Python3 interpreter variant is set.

Zeppelin Notebook

How to Call a Data Exchange API

This shows a sample API call from a Zeppelin Notebook using the Python interpreter and our internal gateway that handles the authentication procedures in a seamless manner:

%python
import os
import requests
import json
#get the proxy – we call it Gateway- URL
gw = os.environ['GATEWAY_ENDPOINT'] + '/gateway/'
#some paths to remember
DEpath = 'dataexchange/v3/'
dirs = 'directories/'
pub = '_PUBLIC_ROOT_ID'
response = requests.get(gw + DEpath + dirs +pub) #this will list the Public Directory from Data Exchange
#let’s parse the response
allpub = json.loads(response.content)

#we only read the ‘files’; this also contains the ‘directories’ child which is also iterrable
for file in allpub ['files']:
    print("File id: " + str(file['id']))
#uploading files work in a similar fashion; or working with other MDSP services** <br/>

This example calls the Predictive Learning Developer Data Exchange API and lists the contents of the public root folder.

How to load existing Datasets

When using Zeppelin notebooks, these variables can be read using the Zeppelin session instance variable named 'z', available under Spark and PySpark Interpreters:

//make sure you have the proper context set up at the beginning of your paragraph, like %Spark
var inpf = z.get( inputFolder )
var outf = zget( outputFolder )
var dsname = z.get( datasetName )  

When you want to load the dataset you need in your Zeppelin notebook, you can use the built-in library functions, like this:

%spark
val names = com.siemens.mindsphere.prl.tools.dsaccess.DatasetUtil.getDatasetNames()
//make sure you also pass in the spark context
var ds = com.siemens.mindsphere.prl.tools.dsaccess.DatasetUtil.loadDatasetByName('my dataset', spark)
ds.createOrReplaceTempView( \"my_awesome_data\" )

Copying Data From IDL

Use the code below to obtain a temporary token (via the PrL gateway) and send a read request to the Integrated Data Lake (IDL) API . The IDL API permits only the read operation with a temporary access token. The path part /data/ten=mytenant/ is fixed and cannot be changed.

Once the AWS temporary keys have been set up, you can use AWS CLI commands from a command line to perform read operations against the IDL bucket. The bucket name is indicated in Json format in the IDL response.

%%bash
content=$(curl --location --request POST $GATEWAY_ENDPOINT$'/datalake/v3/generateAccessToken' --header 'Content-Type:<br/> application/json' --data-raw '{ "subtenantId":"" } ')
#echo $content

`secret=$(jq -r '.credentials.secretAccessKey' <<< "${content}")`
`session=$(jq -r '.credentials.sessionToken' <<< "${content}")`
`accesskey=$(jq -r '.credentials.accessKeyId' <<< "${content}")`

export AWS_ACCESS_KEY_ID=$(echo "${accesskey}")
export AWS_SECRET_ACCESS_KEY=$(echo "${secret}")
export AWS_SESSION_TOKEN=$(echo "${session}")
aws s3 ls s3://datalake-integ-aaad/data/ten=tenantname/

Last update: January 22, 2024