Metadata Management from Integrated Data Lake¶
Note
Integrated Data Lake Service API version 4.*.*is available only for Virtual Private cloud.
Metadata management is a crucial aspect of data management within any Data Lake application. It involves the systematic organization, storage, retrieval, and maintenance of metadata, which is essentially data about data. This includes information about the characteristics, origin, usage, and relationships of the actual data stored in the Data Lake.
Metadata collection operations¶
Metadata collection is the systematic process of collecting metadata keys that manage all the relevant metadata information associated with the specific dataset or system.
Metadata collections are classified into following categories:
- Global collection
- Custom collection
- Reserved collection
Creating a Metadata collection¶
This API can be used to create metadata collection.
Note
Global collections are available by default in the system and cannot be created.
Create a collection using the following endpoint:
POST/metadataCollections
Sample Request:
{
"metadataCollectionId": "document_review",
"label": "Document Review",
"description": "usage and purpose of custom Metadata Collection"
}
Note
The users cannot create a collection with name "global" or "system", as it is reserved for system use.
Sample Response:
{
"metadataCollectionId": "document_review",
"label": "Document Review",
"description": "usage and purpose of custom Metadata Collection",
"status": "DRAFT",
"etag": 1
}
Retrieving the list of Metadata collections¶
This API can be used to get the list of all the metadata collections. Global metadata collection (though default) is listed only when a metadata key is added into it.
Retrieve collection list using the below endpoint:
GET/metadataCollections
Sample Response:
{
"metadataCollections": [
{
"metadataCollectionId": "document_review",
"label": "Document Review",
"description": "usage and purpose of custom Metadata Collection",
"status": "DRAFT",
"etag": 1
}
],
"page": {
"size": 0,
"totalElements": 0,
"totalPages": 0,
"number": 0
}
}
Retrieving the Metadata collection details¶
This API can be used to get the details of the collection for the given id.
Note
With this endpoint, only collection details are fetched and not the details related to the keys inside the collection.
Retrieve collection details using the below endpoint:
GET/metadataCollections/{metadataCollectionId}
Sample Response:
{
"metadataCollectionId": "document_review",
"label": "Document Review",
"description": "usage and purpose of custom Metadata Collection",
"status": "DRAFT",
"etag": 1
}
Update the Metadata collection details¶
This API can be used to update the details of the collection for the given id.
Note
It is only possible to update the collections created by the user.
Update collection details using the below endpoint:
PATCH/metadataCollections/{metadataCollectionId}
Sample Request:
{
"label": "Document Review",
"description": "usage and purpose of custom Metadata Collection",
"status": "DRAFT"
}
Note
Collection status can be updated only from DRAFT to PUBLISHED.
Sample Response:
{
"metadataCollectionId": "document_review",
"label": "Document Review",
"description": "usage and purpose of custom Metadata Collection",
"status": "DRAFT",
"etag": 1
}
Deleting the Collection¶
This API can be used to delete the collection of the given id.
Delete the collection using the following endpoint:
DELETE/metadataCollections/{metadataCollectionId}
Note
- It is not possible to delete Global collection.
- It is only possible to delete the custom collection in draft status.
Metadata Keys Operations¶
Metadata keys are the unique identifiers used to represent specific attributes or characteristics associated with the data in a key-value pair structure.
Creating a Metadata Key¶
This API can be used to create metadata key. This API allows admin users to add metadata keys to metadata collection. By default, admin users can add keys directly to Global metadata collection. There are various configuration parameters available while defining a metadata key. These configuration parameters helps admin users to define keys to accept specific metadata values.
The configuration parameters are defined below:
-
Value Type: This is the data type that defines the type of metadata value which are accepted for this key. The available value type details are described below:
Value Type Use Case Enum Metadata values are pre-defined and user can select one of the value. The available values are seen while defining a key, which supports maximum of 300 values.
The special characters supported in Enum values are
- Alphanumeric characters (A-Z, a-z, 0-9)
- Hyphen ("-")
- Forward slash ("/")
- Parentheses ("(" and ")")
- Period (".")
- Colon (":")
- Comma (",")
- Underscore ("_")
- spaces(" ") but not leading and trailingEnumList Metadata values are pre-defined and user can select one or more values. The available values are seen while defining a key, which supports maximum of 300 values. Additionally user can select maximum and minimum values.
The special characters supported in EnumList values are
- Alphanumeric characters (A-Z, a-z, 0-9)
- Hyphen ("-")
- Forward slash ("/")
- Parentheses ("(" and ")")
- Period (".")
- Colon (":")
- Comma (",")
- Underscore ("_") but not leading or trailing
- spaces(" ") but not leading and trailingString User has to input one metadata value as string. Additionally, maximum and minimum length are also configured. StringList User has to input one or more metadata values as string. Additionally, maximum and minimum number of accepted values are also configured. Long Text User has to input long values of string e.g. "notes or information as metadata for data lake resources". Long Text support maximum of 2048 characters -
Field name and Field values: These are the additional details that admin users can use to put constraints on metadata values as described below:
Value Type Field Name Field Value Enum options List of comma separated acceptable values for Enum in capital letters EnumList options List of comma separated acceptable values for Enum in capital letters minSize Minimum number of values that can be selected from the enum list. It has to be greater than 0 and less than number of options. maxSize Maximum number of values that can be selected from the enum list. It has to be greater than minSize and less than or equal to number of options String options List of comma separated acceptable values for Enum in capital letters minLength Minimum length of character set in the list that can be accepted as value. It has to be greater than 0. maxLength Maximum length of character set in the list that can be accepted as value. It has to be greater than or equal to minLength StringList options List of comma separated acceptable values for Enum in capital letters minSize Minimum number of values that can be selected from the enum list. It has to be greater than 0 and less than number of options. maxSize Maximum number of values that can be selected from the enum list. It has to be greater than minSize and less than or equal to number of options -
applyOn: This parameter is used to define where it should be applied i.e., objects (files) or folders based on various requirements.
Options Use Case OBJECTS_ONLY Can be used for any metadata that is applicable only at object level FOLDERS_ONLY Can be used to set the metadata in the entire hierarchy which should not be changed at any level e.g. Business Sensitivity or Classification Levels.
To be avoided for the metadata which is updated regularlyFOLDERS_AND_OBJECTS Can be used where metadata is applicable on entire hierarchy but can be changed if required -
Rule Key: This indicated if this key can be used to create rules.
Note
- Minimum value for key should be 2 characters.
- The key consists of alphabetic characters or a combination of alphabetic and numeric characters. It should not be composed of numerical digits.
Recommendations:
- It is advised to keep the key in DRAFT status when created. Many details of the key cannot be updated once key is PUBLISHED.
Create a collection using the following endpoint:
POST/metadataCollections/{metadataCollectionId}/keys
Sample Request:
{
"key": "country_of_origin",
"label": "Country of Origin",
"description": "description of metadata key and its usage",
"valueType": "String",
"additionalDetails": [
{
"fieldName": "StringType",
"fieldValues": [
"ALPHABETIC"
]
}
],
"applyOn": "OBJECTS_ONLY",
"isMandatory": true,
"isRuleKey": false,
"isSearchable": false,
"status": "DRAFT"
}
Sample Response:
{
"key": "country_of_origin",
"label": "Country of Origin",
"description": "description of metadata key and its usage",
"valueType": "String",
"additionalDetails": [
{
"fieldName": "StringType",
"fieldValues": [
"ALPHABETIC"
]
}
],
"applyOn": "OBJECTS_ONLY",
"isMandatory": true,
"isRuleKey": false,
"isSearchable": false,
"status": "DRAFT",
"etag": 1
}
Retrieving the list of Metadata keys¶
This API can be used to get the list of all the metadata keys in the provided metadata collection.
Retrieve list of keys using the below endpoint:
GET /metadataCollections/{metadataCollectionId}/keys
Sample Response:
{
"metadataKeys": [
{
"key": "country_of_origin",
"label": "Country of Origin",
"description": "description of metadata key and its usage",
"valueType": "String",
"additionalDetails": [
{
"fieldName": "StringType",
"fieldValues": [
"ALPHABETIC"
]
}
],
"applyOn": "OBJECTS_ONLY",
"isMandatory": true,
"isRuleKey": false,
"isSearchable": false,
"status": "DRAFT",
"etag": 1
}
],
"page": {
"size": 0,
"totalElements": 0,
"totalPages": 0,
"number": 0
}
}
Retrieving the key details¶
This API can be used to get the details of the defined key in the collection.
Retrieve key details using the below endpoint:
GET /metadataCollections/{metadataCollectionId}/keys/{metadataKey}
Sample Response:
{
"key": "country_of_origin",
"label": "Country of Origin",
"description": "description of metadata key and its usage",
"valueType": "String",
"additionalDetails": [
{
"fieldName": "StringType",
"fieldValues": [
"ALPHABETIC"
]
}
],
"applyOn": "OBJECTS_ONLY",
"isMandatory": true,
"isRuleKey": false,
"isSearchable": false,
"status": "DRAFT",
"etag": 1
}
Update the Metadata key details¶
This API allows the admin users to update metadata key configuration. Update details for metadata key depends on the status of the key.
The following parameters can be updated based on key status:
Parameter | Draft Status | Published Status |
---|---|---|
label & description | Can be updated | Can be updated |
Additional Details | Can modify or remove existing constraints, or add new constraints | Can change existing fieldValues to extend the already defined range e.g. minValue can be reduced or maxValue can be increased or new Enum value can be added. Cannot remove any existing constraints |
isMandatory | Can be updated to true or false | Can be updated to true or false |
isRuleKey | Can be updated to true or false | Can be updated only if metadata collection in which this ruleKey is used is in DRAFT status |
applyOn | Can be updated to any of the available Enum values | Cannot be updated |
isSearchable | Can be updated to true or false | Cannot be updated |
Update key details using the below endpoint:
PATCH /metadataCollections/{metadataCollectionId}/keys/{metadataKey}
Sample Request:
{
"label": "Business Sensitivity",
"description": "description of metadata key and its usage",
"additionalDetails": [
{
"fieldName": "StringType",
"fieldValues": [
"ALPHABETIC"
]
}
],
"isMandatory": true,
"isRuleKey": false,
"status": "DRAFT"
}
Sample Response:
{
"key": "country_of_origin",
"label": "Country of Origin",
"description": "description of metadata key and its usage",
"valueType": "String",
"additionalDetails": [
{
"fieldName": "StringType",
"fieldValues": [
"ALPHABETIC"
]
}
],
"applyOn": "OBJECTS_ONLY",
"isMandatory": true,
"isRuleKey": false,
"isSearchable": false,
"status": "DRAFT",
"etag": 1
}
Deleting the Keys¶
This API allows tenant admin to delete metadata key under the provided metadata collection provided in the request.
Note
It is only possible to delete the key, which is in Draft status.
Delete the collection using the following endpoint:
DELETE /metadataCollections/{metadataCollectionId}/keys/{metadataKey}
Metadata Rules Operations¶
Metadata Rules are a set of predefined conditions that determines the metadata tied to a custom collection that is applicable to Integrated Data Lake resources.
Create a Metadata Rule¶
This API creates metadata rules for metadata collection corresponding to a metadata key.
Note
- Status of the rule is governed by collection status
- Rule can be created for collection in DRAFT as well as PUBLISHED status
- If collection is in DRAFT status, rule is not applied and collection attributes will not be available to enter metadata values
- If collection is in PUBLISHED status, rule gets applied immediately
Recommendations:
- It is advised to Keep the collection in DRAFT status while creating rule. Do not publish collection unless certain about rule key and its value used in the rule.
Create a metadata rule using the following endpoint:
POST /metadataCollections/{metadataCollectionId}/metadataRules
Sample Request:
{
"name": "Document Review Rule",
"key": "business_sensitivity",
"value": "STRICTLY_PRIVATE"
}
Sample Response:
{
"id": "0860f696-af41-4d8d-a104-c1dd508de97a",
"name": "Document Review Rule",
"key": "business_sensitivity",
"value": "STRICTLY_PRIVATE",
"etag": 1
}
Retrieving the list of Metadata rules¶
This API lists all the metadata rules associated with a collection.
Retrieve the list of all the metadata rules using the following endpoint:
GET /metadataCollections/{metadataCollectionId}/metadataRules
Sample Response:
{
"metadataRules": [
{
"id": "0860f696-af41-4d8d-a104-c1dd508de97a",
"name": "Document Review Rule",
"key": "business_sensitivity",
"value": "STRICTLY_PRIVATE",
"etag": 1
}
],
"page": {
"size": 0,
"totalElements": 0,
"totalPages": 0,
"number": 0
}
}
Retrieving the metadata rule details¶
This API provides the metadata rule details for the given id.
Retrieve the metadata rule details using the following endpoint:
GET /metadataCollections/{metadataCollectionId}/metadataRules/{id}
Sample Response:
{
"id": "0860f696-af41-4d8d-a104-c1dd508de97a",
"name": "Document Review Rule",
"key": "business_sensitivity",
"value": "STRICTLY_PRIVATE",
"etag": 1
}
Updating the metadata rules¶
This API updates the metadata rule details for the given id. Rule can be updated only in DRAFT status.
Update the metadata rule details using the below endpoint:
PATCH /metadataCollections/{metadataCollectionId}/metadataRules/{id}
Sample Request:
{
"name": "Document Review Rule",
"key": "business_sensitivity",
"value": "STRICTLY_PRIVATE"
}
Sample Response:
{
"id": "0860f696-af41-4d8d-a104-c1dd508de97a",
"name": "Document Review Rule",
"key": "business_sensitivity",
"value": "STRICTLY_PRIVATE",
"etag": 1
}
Delete the metadata rule¶
This API deletes the metadata rule provided in the request.
Note
It is only possible to delete the rule, which is in Draft status.
Delete the collection using the following endpoint:
DELETE /metadataCollections/{metadataCollectionId}/metadataRules/{id}
Attribute Based Access Control¶
Attribute Based Access Control (ABAC) is an authorization model that evaluates attributes to determine access. For more information on ABAC, refer to Attribute Based Access Control configuration.
The Attribute Based Access Control for Integrated Data Lake operations are as follows:
-
Upload an Object via App Gateway with Metadata
This PUT method is used to upload an object in the path is permitted as per the policy created for the user by Tenant Admin.
- Access to upload a file is evaluated at the prefix (parent folder) level for the user.
- For evaluating the access control, refer to Object and folder operation.
Endpoint:
PUT /api/datalake/v4/objects/{path}
-
Download Object via App Gateway
This GET method is used to download an object in the path is permitted as per the policy created for the user by Tenant Admin.
- Access to download a file is evaluated at the object or file level for the user.
Endpoint:
GET /objects/{path}
-
Create empty folder with Metadata
This POST method is used to create a folder in the path is permitted as per the policy created for the user by Tenant Admin.
- Access to create a folder is evaluated at the prefix (parent folder) level for the user.
- For evaluating the access control, refer to Object and folder operation.
Endpoint:
POST /folders
-
Delete an empty Folder
This DELETE method is used to delete a folder in the path is permitted as per the policy created for the user by Tenant Admin.
- Access to delete a folder is evaluated at the prefix (folder) level for the user.
- If permitted, metadata associated with the folder are deleted.
Endpoint:
DELETE /folders/{path}
-
Delete Object
This DELETE method is used to delete an object on the path which is permitted as per the policy created for the user by Tenant Admin.
- Access to delete file is evaluated at the object (file) level for the user.
- If permitted, metadata associated with the folder are deleted.
Endpoint:
DELETE/{path}
-
List Objects
This GET method is used to access to list objects at a given path for the tenant is evaluated as per the policy created by the Tenant Admin.
- User can see the folder hierarchy, if access on the parent folder is provided with propagation depth as -1. For more information on propagation depth, refer to Object and folder operation.
- If user does not have access to any resources, list response is empty. List response will not give any error.
Endpoint:
GET /listObjects
-
Add Metadata
Users with "WRITE" action can add or update the metadata on the resources.
Endpoint:
/objectMetadata/{path}
-
Get Metadata
Users with "LIST" access can see UI or fetch API the metadata on the resources.
Endpoint:
/objectMetadata/{path}
Access Control Actions¶
The access control actions for Integrated Data Lake operations are described below:
Operation | List | Read + List | Create + List | Delete + List | Create + Read + List | Delete + Read + List | Delete + Create + Read + List |
---|---|---|---|---|---|---|---|
View resources (files and folders) and System metadata | |||||||
View Metadata (user defined metadata) | |||||||
Upload file with metadata | |||||||
Download the files | |||||||
Create folder with metadata | |||||||
Update Metadata (independent operation) | |||||||
Delete file(s) or empty folder | |||||||
Search objects - File name + Metadata (system + user defined metadata) |
Action Dependency Table¶
The dependencies for Integrated Data Lake actions are described below:
Action | Description |
---|---|
List Resources mdsp:core:idl:prefix:list | Listing of resources within the storage. |
Download Files mdsp:core:idl:prefix:read | Downloading files from the storage. This depends on the list of the resources. |
Upload files and create folders mdsp:core:idl:prefix:write | Uploading the files and creating empty folder with the storage. This depends on the list of the resources and downloading files from the storage. |
Delete files and folder mdsp:core:idl:prefix:delete | Deleting the files and folder from the storage. This depends on the list of the resources and downloading files from the storage. |
Except where otherwise noted, content on this site is licensed under the Development License Agreement.