Skip to content

Metadata Management from Integrated Data Lake

Note

Integrated Data Lake Service API version 4.*.*is available only for Virtual Private cloud.

Metadata management is a crucial aspect of data management within any Data Lake application. It involves the systematic organization, storage, retrieval, and maintenance of metadata, which is essentially data about data. This includes information about the characteristics, origin, usage, and relationships of the actual data stored in the Data Lake.

Metadata collection operations

Metadata collection is the systematic process of collecting metadata keys that manage all the relevant metadata information associated with the specific dataset or system.

Metadata collections are classified into following categories:

  • Global collection
  • Custom collection
  • Reserved collection

Creating a Metadata collection

This API can be used to create metadata collection. Global collections are available by default in the system and cannot be created.

Create a collection using the following endpoint:

POST/metadataCollections

Sample Request:

{
  "metadataCollectionId": "document_review",
  "label": "Document Review",
  "description": "usage and purpose of custom Metadata Collection"
}

Note

  • The users cannot create a collection with name "global" or "system", as it is reserved for system use.

Sample Response:

{
  "metadataCollectionId": "document_review",
  "label": "Document Review",
  "description": "usage and purpose of custom Metadata Collection",
  "status": "DRAFT",
  "etag": 1
}

Retrieving the list of Metadata collections

This API can be used to get the list of all the metadata collections. Global metadata collection (though default) is listed only when a metadata key is added into it.

Retrieve collection list using the below endpoint:

GET/metadataCollections

Sample Response:

{
  "metadataCollections": [
    {
      "metadataCollectionId": "document_review",
      "label": "Document Review",
      "description": "usage and purpose of custom Metadata Collection",
      "status": "DRAFT",
      "etag": 1
    }
  ],
  "page": {
    "size": 0,
    "totalElements": 0,
    "totalPages": 0,
    "number": 0
  }
}

Retrieving the Metadata collection details

This API can be used to get the details of the collection for the given id.

Note

With this endpoint, only collection details are fetched and not the details related to the keys inside the collection.

Retrieve collection details using the below endpoint:

GET/metadataCollections/{metadataCollectionId}

Sample Response:

{
  "metadataCollectionId": "document_review",
  "label": "Document Review",
  "description": "usage and purpose of custom Metadata Collection",
  "status": "DRAFT",
  "etag": 1
}

Update the Metadata collection details

This API can be used to update the details of the collection for the given id.

Note

It is only possible to update the collections created by the user.

Update collection details using the below endpoint:

PATCH/metadataCollections/{metadataCollectionId}

Sample Request:

{
  "label": "Document Review",
  "description": "usage and purpose of custom Metadata Collection",
  "status": "DRAFT"
}

Note

Collection status can be updated only from DRAFT to PUBLISHED.

Sample Response:

{
  "metadataCollectionId": "document_review",
  "label": "Document Review",
  "description": "usage and purpose of custom Metadata Collection",
  "status": "DRAFT",
  "etag": 1
}

Deleting the Collection

This API can be used to delete the collection of the given id.

Delete the collection using the following endpoint:

DELETE/metadataCollections/{metadataCollectionId}

Note

  • It is not possible to delete Global collection.
  • It is only possible to delete the custom collection in draft status.

Metadata Keys Operations

Metadata keys are the unique identifiers used to represent specific attributes or characteristics associated with the data in a key-value pair structure.

Creating a Metadata Key

This API can be used to create metadata key. This API allows admin users to add metadata keys to metadata collection. By default, admin users can add keys directly to Global metadata collection. There are various configuration parameters available while defining a metadata key. These configuration parameters helps admin users to define keys to accept specific metadata values.

The configuration parameters are defined below:

  • Value Type: This is the data type that defines the type of metadata value which are accepted for this key. The available value type details are described below:

    Value Type Use Case
    Enum Metadata values are pre-defined and user can select one of the value. The available values are seen while defining a key, which supports maximum of 300 values.
    The special characters supported in Enum values are
    - Alphanumeric characters (A-Z, a-z, 0-9)
    - Hyphen ("-")
    - Forward slash ("/")
    - Parentheses ("(" and ")")
    - Period (".")
    - Colon (":")
    - Comma (",")
    - Underscore ("_")
    - spaces(" ") but not leading and trailing
    EnumList Metadata values are pre-defined and user can select one or more values. The available values are seen while defining a key, which supports maximum of 300 values. Additionally user can select maximum and minimum values.
    The special characters supported in EnumList values are
    - Alphanumeric characters (A-Z, a-z, 0-9)
    - Hyphen ("-")
    - Forward slash ("/")
    - Parentheses ("(" and ")")
    - Period (".")
    - Colon (":")
    - Comma (",")
    - Underscore ("_") but not leading or trailing
    - spaces(" ") but not leading and trailing
    String User has to input one metadata value as string. Additionally, maximum and minimum length are also configured.
    StringList User has to input one or more metadata values as string. Additionally, maximum and minimum number of accepted values are also configured.
    Long Text User has to input long values of string e.g. "notes or information as metadata for data lake resources". Long Text support maximum of 2048 characters
  • Field name and Field values: These are the additional details that admin users can use to put constraints on metadata values as described below:

    Value Type Field Name Field Value
    Enum options List of comma separated acceptable values for Enum in capital letters
    EnumList options List of comma separated acceptable values for Enum in capital letters
    minSize Minimum number of values that can be selected from the enum list. It has to be greater than 0 and less than number of options.
    maxSize Maximum number of values that can be selected from the enum list. It has to be greater than minSize and less than or equal to number of options
    String options List of comma separated acceptable values for Enum in capital letters
    minLength Minimum length of character set in the list that can be accepted as value. It has to be greater than 0.
    maxLength Maximum length of character set in the list that can be accepted as value. It has to be greater than or equal to minLength
    StringList options List of comma separated acceptable values for Enum in capital letters
    minSize Minimum number of values that can be selected from the enum list. It has to be greater than 0 and less than number of options.
    maxSize Maximum number of values that can be selected from the enum list. It has to be greater than minSize and less than or equal to number of options
  • applyOn: This parameter is used to define where it should be applied i.e., objects (files) or folders based on various requirements.

    Options Use Case
    OBJECTS_ONLY Can be used for any metadata that is applicable only at object level
    FOLDERS_ONLY Can be used to set the metadata in the entire hierarchy which should not be changed at any level e.g. Business Sensitivity or Classification Levels.
    To be avoided for the metadata which is updated regularly
    FOLDERS_AND_OBJECTS Can be used where metadata is applicable on entire hierarchy but can be changed if required
  • Rule Key: This indicated if this key can be used to create rules.

Note

  • Minimum value for key should be 2 characters.
  • The key consists of alphabetic characters or a combination of alphabetic and numeric characters. It should not be composed of numerical digits.

Recommendations:

  • It is advised to keep the key in DRAFT status when created. Many details of the key cannot be updated once key is PUBLISHED.

Create a collection using the following endpoint:

POST/metadataCollections/{metadataCollectionId}/keys

Sample Request:

{
  "key": "country_of_origin",
  "label": "Country of Origin",
  "description": "description of metadata key and its usage",
  "valueType": "String",
  "additionalDetails": [
    {
      "fieldName": "StringType",
      "fieldValues": [
        "ALPHABETIC"
      ]
    }
  ],
  "applyOn": "OBJECTS_ONLY",
  "isMandatory": true,
  "isRuleKey": false,
  "isSearchable": false,
  "status": "DRAFT"
}

Sample Response:

{
  "key": "country_of_origin",
  "label": "Country of Origin",
  "description": "description of metadata key and its usage",
  "valueType": "String",
  "additionalDetails": [
    {
      "fieldName": "StringType",
      "fieldValues": [
        "ALPHABETIC"
      ]
    }
  ],
  "applyOn": "OBJECTS_ONLY",
  "isMandatory": true,
  "isRuleKey": false,
  "isSearchable": false,
  "status": "DRAFT",
  "etag": 1
}

Retrieving the list of Metadata keys

This API can be used to get the list of all the metadata keys in the provided metadata collection.

Retrieve list of keys using the below endpoint:

GET /metadataCollections/{metadataCollectionId}/keys

Sample Response:

{
  "metadataKeys": [
    {
      "key": "country_of_origin",
      "label": "Country of Origin",
      "description": "description of metadata key and its usage",
      "valueType": "String",
      "additionalDetails": [
        {
          "fieldName": "StringType",
          "fieldValues": [
            "ALPHABETIC"
          ]
        }
      ],
      "applyOn": "OBJECTS_ONLY",
      "isMandatory": true,
      "isRuleKey": false,
      "isSearchable": false,
      "status": "DRAFT",
      "etag": 1
    }
  ],
  "page": {
    "size": 0,
    "totalElements": 0,
    "totalPages": 0,
    "number": 0
  }
}

Retrieving the key details

This API can be used to get the details of the defined key in the collection.

Retrieve key details using the below endpoint:

GET /metadataCollections/{metadataCollectionId}/keys/{metadataKey}

Sample Response:

{
  "key": "country_of_origin",
  "label": "Country of Origin",
  "description": "description of metadata key and its usage",
  "valueType": "String",
  "additionalDetails": [
    {
      "fieldName": "StringType",
      "fieldValues": [
        "ALPHABETIC"
      ]
    }
  ],
  "applyOn": "OBJECTS_ONLY",
  "isMandatory": true,
  "isRuleKey": false,
  "isSearchable": false,
  "status": "DRAFT",
  "etag": 1
}

Update the Metadata key details

This API allows the admin users to update metadata key configuration. Update details for metadata key will depend on the status of the key

Following parameter can be updated based on key status:

Parameter Draft Status Published Status
label & description Can be updated Can be updated
Additional Details Can modify or remove existing constraints, or add new constraints Can change existing fieldValues to extend the already defined range e.g. minValue can be reduced or maxValue can be increased or new Enum value can be added.
Cannot remove any existing constraints
isMandatory Can be updated to true or false Can be updated to true or false
isRuleKey Can be updated to true or false Can be updated only if metadata collection in which this ruleKey is used is in DRAFT status
applyOn Can be updated to any of the available Enum values Cannot be updated
isSearchable Can be updated to true or false Cannot be updated

Update key details using the below endpoint:

PATCH /metadataCollections/{metadataCollectionId}/keys/{metadataKey}

Sample Request:

{
  "label": "Business Sensitivity",
  "description": "description of metadata key and its usage",
  "additionalDetails": [
    {
      "fieldName": "StringType",
      "fieldValues": [
        "ALPHABETIC"
      ]
    }
  ],
  "isMandatory": true,
  "isRuleKey": false,
  "status": "DRAFT"
}

Sample Response:

{
  "key": "country_of_origin",
  "label": "Country of Origin",
  "description": "description of metadata key and its usage",
  "valueType": "String",
  "additionalDetails": [
    {
      "fieldName": "StringType",
      "fieldValues": [
        "ALPHABETIC"
      ]
    }
  ],
  "applyOn": "OBJECTS_ONLY",
  "isMandatory": true,
  "isRuleKey": false,
  "isSearchable": false,
  "status": "DRAFT",
  "etag": 1
}

Deleting the Keys

This API allows tenant admin to delete metadata key under the provided metadata collection provided in the request.

Note

It is only possible to delete the key, which is in Draft status.

Delete the collection using the following endpoint:

DELETE /metadataCollections/{metadataCollectionId}/keys/{metadataKey}

Metadata Rules Operations

Metadata Rules are a set of predefined conditions that determines the metadata tied to a custom collection that is applicable to Integrated Data Lake resources.

Create a Metadata Rule

This API creates metadata rules for metadata collection corresponding to a metadata key.

Note

  • Status of the rule is governed by collection status
  • Rule can be created for collection in DRAFT as well as PUBLISHED status
  • If collection is in DRAFT status, rule is not applied and collection attributes will not be available to enter metadata values
  • If collection is in PUBLISHED status, rule gets applied immediately

Recommendations:

  • It is advised to Keep the collection in DRAFT status while creating rule. Do not publish collection unless certain about rule key and its value used in the rule.

Create a metadata rule using the following endpoint:

POST /metadataCollections/{metadataCollectionId}/metadataRules

Sample Request:

{
  "name": "Document Review Rule",
  "key": "business_sensitivity",
  "value": "STRICTLY_PRIVATE"
}

Sample Response:

{
  "id": "0860f696-af41-4d8d-a104-c1dd508de97a",
  "name": "Document Review Rule",
  "key": "business_sensitivity",
  "value": "STRICTLY_PRIVATE",
  "etag": 1
}

Retrieving the list of Metadata rules

This API lists all the metadata rules associated with a collection.

Retrieve the list of all the metadata rules using the following endpoint:

GET /metadataCollections/{metadataCollectionId}/metadataRules

Sample Response:

{
  "metadataRules": [
    {
      "id": "0860f696-af41-4d8d-a104-c1dd508de97a",
      "name": "Document Review Rule",
      "key": "business_sensitivity",
      "value": "STRICTLY_PRIVATE",
      "etag": 1
    }
  ],
  "page": {
    "size": 0,
    "totalElements": 0,
    "totalPages": 0,
    "number": 0
  }
}

Retrieving the metadata rule details

This API provides the metadata rule details for the given id.

Retrieve the metadata rule details using the following endpoint:

GET /metadataCollections/{metadataCollectionId}/metadataRules/{id}

Sample Response:

{
  "id": "0860f696-af41-4d8d-a104-c1dd508de97a",
  "name": "Document Review Rule",
  "key": "business_sensitivity",
  "value": "STRICTLY_PRIVATE",
  "etag": 1
}

Updating the metadata rules

This API updates the metadata rule details for the given id. Rule can be updated only in DRAFT status.

Update the metadata rule details using the below endpoint:

PATCH /metadataCollections/{metadataCollectionId}/metadataRules/{id}

Sample Request:

{
  "name": "Document Review Rule",
  "key": "business_sensitivity",
  "value": "STRICTLY_PRIVATE"
}

Sample Response:

{
  "id": "0860f696-af41-4d8d-a104-c1dd508de97a",
  "name": "Document Review Rule",
  "key": "business_sensitivity",
  "value": "STRICTLY_PRIVATE",
  "etag": 1
}

Delete the metadata rule

This API deletes the metadata rule provided in the request.

Note

It is only possible to delete the rule, which is in Draft status.

Delete the collection using the following endpoint:

DELETE /metadataCollections/{metadataCollectionId}/metadataRules/{id}

Attribute Based Access Control

Attribute Based Access Control (ABAC) is an authorization model that evaluates attributes to determine access. For more information on ABAC, refer to Attribute Based Access Control (ABAC) configuration.

The Attribute Based Access Control for Integrated Data Lake operations are as follows:

  • Upload an Object via App Gateway with Metadata

    This PUT method is used to upload an object in the path is permitted as per the policy created for the user by Tenant Admin. - Access to upload a file is evaluated at the prefix (parent folder) level for the user. - For evaluating the access control, refer to Object and folder operation.

    Endpoint:

    PUT /api/datalake/v4/objects/{path}
    
  • Download Object via App Gateway

    This GET method is used to download an object in the path is permitted as per the policy created for the user by Tenant Admin. - Access to download a file is evaluated at the object or file level for the user.

    Endpoint:

    GET /objects/{path}
    
  • Create empty folder with Metadata

    This POST method is used to create a folder in the path is permitted as per the policy created for the user by Tenant Admin. - Access to create a folder is evaluated at the prefix (parent folder) level for the user. - For evaluating the access control, refer to Object and folder operation.

    Endpoint:

    POST /folders
    
  • Delete an empty Folder

    This DELETE method is used to delete a folder in the path is permitted as per the policy created for the user by Tenant Admin. - Access to delete a folder is evaluated at the prefix (folder) level for the user. - If permitted, metadata associated with the folder are deleted.

    Endpoint:

    DELETE /folders/{path}
    
  • Delete Object

    This DELETE method is used to delete an object on the path which is permitted as per the policy created for the user by Tenant Admin. - Access to delete file is evaluated at the object (file) level for the user. - If permitted, metadata associated with the folder are deleted.

    Endpoint:

    DELETE/{path}
    
  • List Objects

    This GET method is used to access to list objects at a given path for the tenant is evaluated as per the policy created by the Tenant Admin. - User can see the folder hierarchy, if access on the parent folder is provided with propagation depth as -1. For more information on propagation depth, refer to refer to Object and folder operation. - If user does not have access to any resources, list response is empty. List response will not give any error.

    Endpoint:

    GET /listObjects
    
  • Add Metadata

    Users with "WRITE" action can add or update the metadata on the resources.

    Endpoint:

    /objectMetadata/{path}
    
  • Get Metadata

    Users with "LIST" access can see UI or fetch API the metadata on the resources.

    Endpoint:

    /objectMetadata/{path}
    

Access Control Actions

The access control actions for Integrated Data Lake operations are described below:

Operation List Read + List Create + List Delete + List Create + Read + List Delete + Read + List Delete + Create + Read + List
View resources (files and folders) and System metadata tick tick tick tick tick tick tick
View Metadata (user defined metadata) error tick error error tick tick tick
Upload file with metadata error error tick error tick tick tick
Download the files error tick error error tick tick tick
Create folder with metadata error error tick error tick error tick
Update Metadata (independent operation) error error error error tick tick tick
Delete file(s) or empty folder error error error tick error tick tick
Search objects - File name + Metadata (system + user defined metadata) error tick error error tick tick tick

Last update: March 21, 2024

Except where otherwise noted, content on this site is licensed under the Development License Agreement.