Creating a query with IDL data source¶
This section explains the procedure to create a query by using the Integrated Data Lake (IDL) data source.
Prerequisites¶
- Data files (csv or parquet) must be uploaded to IDL through the IDL manager before creating a dashboard or data source.
The uploaded data files should fulfill the following conditions:
- The files should be of CSV or Parquet formats
- The files must have uniform column names
- The CSV Files must not contain delimiters (like commas) in the values of the columns
- Timestamp/Date formats must be uniform through the data in a column
Procedure¶
- Click "Dashboards" in the left navigation.
- Click "New" and select "New dashboard" from the drop-down list.
- Click "+ Add visualization".
- Select "Integrated Data Lake" from the data source list.
- Click "Select a folder from Data Lake", or click the large "Select" button on the right side of the panel.
- In the "IDL Query Selections" list, select a folder that contains CSV or parquet files that have uniform column names and types.
If the user selects a folder that contains folders rather than files, a partition strategy is required. A strategy like v6 and visualizer is used here. For example, if the user has folder and file structures like ParentFolder/MonthFolder/data.csv (with any number of month folders and other files), a partition strategy string like‘/Month=String’
would be required. Furthermore, if their structure is like ParentFolder/MonthFolder/DayFolder/data.csv, a partition strategy string like ‘/Month=String/Day=String’ would be required. - Filter the folders in the panel by using the search box on the left side of the modal pop-up. Additionally, page numbers are displayed at the bottom of the modal to view additional pages of files that are stored in IDL.
- Select a folder containing data to view the schema on the right side of the modal. Click the Select All/None buttons on the top right to change the selection. Additionally, the check boxes are used to select specific attributes from files to visualize.
- Click "Select".
The selected variables will be displayed in the Variables field. You can remove items from selection by clicking the "x" next to the variable. Click on the expand drop-down icon to view the unselected options.
- In the "IDL Query Selections" list, select a folder that contains CSV or parquet files that have uniform column names and types.
- Click the "Schema" button to edit the data types of the columns in the schema.
- Select the required type from the drop-down for each variable. For the "Timestamp" type, in the "Format" drop-down, select the time stamp that matches the existing values in the files that is uploaded. For example, if the CSV file contains a date like MM/DD/YYYY, then choose the same option from the drop-down.
- Click "Save".
- Select the required option from the "Time Column" drop-down. This filed only displays columns that are date or timestamp type, as selected during the schema selection. You can also change the time range using the normal selection at the top right of the page.
After selecting a time column, the query will run and display data if the time range at the top contains data. - Select the required option for "Time Breakdown" to make more advanced queries. This selection allows you to aggregate the information (OR GROUPBY from SQL query) by the day, hour, minute, month, year. If the time format you have selected does not include hour/min information, you will see a shorter list to select what can be inferred from the information available in the timestamp format provided. For example, if only "MM/DD/YYYY" is selected, hour or minute cannot be further provided.
- Expand the "Aggregate Options" section.
- The type of the column determines the kind of aggregate function that should be applied. Number types like Integer, Decimal, and Big Int can be Averaged,Summed, count, min, max or value while String fields can only display the value
- The plus at the bottom of the aggregate panel allows for multiple functions over the same column to be added.
- The function options will unlock based upon the type of the selected column
Last update: June 25, 2024