Context SQL Sync

Introduction

The Context SQL Sync REST service allows importation (either one-time or by scheduling a continuous job) of context data from an SQL database (PostgreSQL, MySQL, or MS SQL) to TrendMiner.

To conduct a synchronization, the service requires two components: a datasource and a query. The datasource defines the connection to the SQL database, from which the data will be fetched. The query provides the SQL query (along with other details as explained later in this document) that will be used to select rows based on a specified time interval. The Context SQL Sync service will create a new context item in TrendMiner (or update an existing item) from each row selected by the query.

Finally, when doing a live sync, a third component is required - a sync job. The sync job encapsulates details, such as the datasource and query that should be used, and the sync frequency, etc.

The steps to sync context data are:

Request an access token for authentication
Define a datasource
Define a query
Perform a one-time sync
or
Define and execute a continuous sync job

Authentication

Prior to version 2021R2.3 only OAuth2 authentication is supported.

As of TrendMiner version 2021R2.3 OpenID Connect/OAuth2 authorization using Keycloak server is introduced. In order to be authorized to use Context SQL Sync swagger UI for service configuration, a user should authenticate and get an access token, which can be used from swagger UI. To get a token:

Open a browser and navigate to <TM-URL>/context-sql-sync/swagger-ui.html#/
Click Authorize button
In the pop-up screen, scroll to openIdConnect (OAuth2, authorization_code) flow.
Enter tm-context-sql-sync-swagger as client_id and select all scopes.
Click Authorize and then log into TrendMiner with user credentials provided.

Note

The access token has a lifespan of 5 minutes, so after that time a new token should be requested, i.e. repeat the above steps.

Context Sync API

A swagger interface is shipped with your appliance and is available on:

<TM-URL>/context-sql-sync/swagger-ui.html#/

When using the swagger interface, an authentication can be done through the browser, by logging in the Trendhub where the swagger is hosted, i.e.:

<TM-URL>

Note

When executing requests in swagger the values in the request body should not contain new lines and tabs. This is especially important when defining the SQL query (see below), because this query is usually very long.

Define a datasource

Note

See Kerberos and Integrated Authentication section, when datasource with integrated security should be used.

The datasource contains the details necessary to connect to the SQL database, such as the type of the database, the address and port, and the credentials which should be used to authenticate. New datasources can be defined by the API:

Method: POST
Path: /context-sql-sync/datasource
Authorization header: Bearer [access_token]
Body
{
  "name": "localdb",
  "jdbcUrl": "jdbc:postgresql://localhost:5432/tm_context_sql_sync",
  "dbType": "POSTGRES",
  "username": "postgres",
  "password": "postgres"
}

Parameter	Description
name	An arbitrary string.
jdbcUrl	A JDBC url, specifying the connection to the database. E.g.: jdbc:mysql://192.0.2.1:3306/database jdbc:postgresql://localhost:5432/database jdbc:sqlserver://localhost:1433;databaseName=database jdbc:redshift://localhost:5439/database
dbType	Specifies the type of the database. The supported values are: MYSQL POSTGRES MSSQL REDSHIFT Contact your TrendMiner CSM in case of different databases to see how we could help out.
username/password	The credentials used to access the database; can be omitted if the database does not require authentication.

The result will contain the id of the newly created datasource. This id will be used later when executing a historical sync or when defining sync jobs.

Define a Query

New queries can be defined by the API:

Method: POST
Path: /context-sql-sync/query
Authorization header: Bearer [access_token]
Body
{
  "name": "queryAlias",
  "supportsParameters": true,
  "supportsNamedParameters": true,
  "componentLookupStrategy": "ASSETS|TAGS|ASSETS_THEN_TAGS",
  "fieldNames": "customField1, customField2",
  "validateFieldNames": true,
  "timeZone": "string",
  "sqlString": "Select id, componentId, type, description, start_event, start_time, stop_event, stop_time, customField1, customField2 from LOGBOOK where :startDate <= start_time and start_time < :endDate;"  
  "typeMapping": [
    { "PRODUCTION": "MAPPED_TYPE" },
    { "SOURCE_TYPE": "TM_TYPE" }
  ]
}

Parameters

Description

name

An arbitrary string.

supportsParameters

A boolean value (true|false) indicating whether the SQL query contains positional parameters. The default is true.

Positional parameters are specified with question marks '?' in the SQL query. The query should contain exactly two parameters - they will be replaced with the start and end time of the requested time interval respectively.

supportsNamedParameters

A boolean value (true|false) indicating whether the SQL query contains named parameters. The default is true. If this property is set to true, it will take precedence over supportsParameters.

Named parameters are specified with the ':startDate' and ':endDate' (without the quotes) values in the SQL query - they will be replaced with the start and end time of the requested time interval respectively.

componentLookupStrategy

A value specifying how to interpret the componentId returned by the query. The allowed values are ASSETS (componentId is interpreted as the external id of an asset), TAGS (componentId is interpreted as a tag name), and ASSETS_THEN_TAGS (componentId is interpreted as an asset and, if no asset is found, then componentId is interpreted as a tag name). The default is ASSETS.

fieldNames

A comma-separated list of values, each representing the name of a column in the SQL query (see below). These columns represent custom fields to be associated with the newly created or updated context items. This property is optional and defaults to null. Bear in mind that the specified values are validated against the SQL query and the request will fail if the field names are not found in the SELECT clause.

If the fields exist on the context type associated with the item, the values of the columns specified by this property will be imported as values of the fields with the respective technical names. If they don’t exist, then the column values will be imported as general properties of the context item.

validateFieldNames

Optional. If missing or set to true, field names specified in the query will be validated. If set to false, the validation will be skipped.

timeZone

Specifies the timezone of the dates in the SQL database. The default is UTC. The list of supported timezones can be found by using the following endpoints:

<TM-URL>/hps/api/monitoring/options/timezone/available

https://en.wikipedia.org/wiki/List_of_tz_database_time_zones

sqlString

The string of the SQL query that will be executed against the datasource to fetch the items. The query is required to return (at least) the following columns (in order): id, componentId, type, description, startEvent, startTime, endEvent, endTime. The query can contain additional columns, but they should come in after these first eight columns. The extra columns are often necessary when using custom fields.

Additional information about the individual columns can be found below:

Column	Details
id	Unique identifier for each row in the database which will be prepended by prefix (see below) during the sync process. This is used as an external identifier of the context item. So that every row with an unique identifier will result in a single context item. The id should not contain spaces. If the id is not available, it can potentially be tailored from other columns in the database. The only requirement is that it is unique and does not change between requests.
componentId	The componentId should identify an already existing asset or tag in TrendMiner (according to the currently used lookup strategy, as specified by componentLookupStrategy). For assets this should be the ExternalId, for tags - the tag name. Note To get an overview of the synced asset structure and the external ids of the different assets you can use the "/assets/browse" endpoint of the TrendMiner Assets API. The external ID of the assets is identified by the “externalId” field. Note For assets imported via CSV, the ExternalId is the asset’s path.
type	Refers back to the "Context Item Type" as defined in the ContextHub Config by the TrendMiner Admin. Values can be mapped to an existing type in TrendMiner in the API request. In case none of the existing types matches, new types can be configured by the Admin in ContextHub>Config>Context item types. The type will be looked up in typeMapping (see below) and the associated value should be the technical identifier of the type to use for the context item.
description	Additional information (as a string) that can be included for the Context Item. Can be NULL if not present.
startEvent	Corresponds to the start-state of the workflow used. This field is currently ignored and can be hardcoded to any value, but the column should be present.
startTime	The start (timestamp) of the event in "YYYY-MM-DD hh:mm:ss" format.
endEvent	Corresponds to the end-state of the workflow used. This field is currently ignored and can be hardcoded to any value, but the column should be present.
endTime	The end (timestamp) of the event in "YYYY-MM-DD hh:mm:ss" format.

An example SQL query:

SELECT
id,
componentId,
type,
description,
start_event,
start_time,
stop_event,
stop_time
FROM some_table
WHERE :startDate <= start_time and start_time < :endDate;

In addition to the required columns, the SQL query could also select:

Zero or more columns to be used as custom fields. The names of these columns should also be included as a comma-separated list in the “fieldNames” property (see above).
The lastModifiedDate. If the datasource contains a lastModifiedDate column, the query can be modified to select records based on this column (rather than the startTime).

typeMapping

A table used to map the values in the type column (see sqlString above) to actual context types in TrendMiner. Even if the type column contains a type that exists, there should be an entry in the typeMapping table that maps the type to itself, e.g.:

"typeMapping": [
  { "PRODUCTION": "PRODUCTION" }
]

Example: If the datasource contains a lastModifiedDate column, the query can be modified to select records based on their lastModifiedDate. For example:

{
...
"sqlString": "Select id, componentId, type, description, start_event, start_time, stop_event, stop_time, last_modified_date from LOGBOOK where :startDate <= last_modified_date and last_modified_date < :endDate;"...}
...
}

This way, if an item is modified, it will be selected for sync, even if its start time is untouched.

The result will contain the id of the newly created query. This id will be used later.

Define a Historical Sync

This endpoint can be used to run one-time synchronization of context items at a specified time interval.

Method: POST
Path: /context-sql-sync/sync
Authorization header: Bearer [access_token]
Body
{
  "datasourceId": "1",
  "queryId": "2",
  "externalIdPrefix": "prefix",
  "startDate": "2018-10-16 12:29:01",
  "endDate": "2018-10-18 20:29:01",
  "chunkSize":1440
}

Parameter	Description
datasourceId	The id of the datasource against which to run the sync.
queryId	The id of the query to be used for the sync.
externalIdPrefix	The prefix that will be prepended to the id obtained from the database in order to produce the external id of the context item. This prefix allows supporting the same external IDs for different data sources.
startDate	The start of the sync interval. This is optional and can be omitted if the respective SQL query does not have parameters.
endDate	The end of the sync interval. This is optional and can be omitted if the respective SQL query does not have parameters.
chunkSize	The sync interval (endDate - startDate) will be split to subintervals with length specified by this value (expressed in minutes). This value is optional and defaults to null to indicate that the entire interval will be synced at one go. The value is also ignored if startDate and endDate are not specified.

The startDate and endDate can be omitted. In this case the associated query will be run only once without parameters.

If the startDate and endDate are specified, the resulting interval is split into small subintervals based on chunkSize. Then the SQL query is executed continuously for each of the subintervals and the start and end of the subintervals are passed to the query as parameters.

The historical sync provides means to perform a “dry” run, where the data is fetched from the database and returned to the customer, but no actual context items are created. To use the dry run:

Path: /context-sql-sync/sync/test

Define a Live Sync

To use live sync, a sync job must be defined first:

Method: POST
Path: /context-sql-sync/syncjob
Authorization header: Bearer [access_token]
Body
{
  "datasourceId": "1",
  "queryId": "2",
  "externalIdPrefix": "prefix",
  "syncJobInterval": "10",
  "chunkSize":1440
}

Parameter	Description
datasourceId	The id of the datasource against which to run the sync.
queryId	The id of the query to be used for the sync.
externalIdPrefix	The prefix that will be prepended to the id obtained from the database in order to produce the external id of the context item. This prefix allows supporting same external IDs for different data sources.
syncJobInterval	The interval (expressed in minutes) between subsequent job executions.
chunkSize	The sync interval will be split into subintervals with length specified by this value (expressed in minutes).

The result will contain the id of the newly created sync job. This id can be used to start and stop the job. Sync job will run for the entire interval from last_sync_date till the current time. The sync jobs should be used only with queries which support parameters – positional or named.

Note

Currently last_sync_date can be determined only by checking the job’s settings in Postgres database.

To start the job:

Method: POST
Path: /context-sql-sync/syncjob/{id}/start
Authorization header: Bearer [access_token]

To stop a job:

Method: POST
Path: /context-sql-sync/syncjob/{id}/stop
Authorization header: Bearer [access_token]

After the job is started, the first time it is executed is calculated based on the syncJobInterval value. For example, if the job has syncJobInterval=10 and the job is started at 15:12, the initial job execution will happen at 15:20.

To get a list of all currently defined sync jobs:

Method: GET
Path: /context-sql-sync/syncjob
Authorization header: Bearer [access_token]

A sync job is started if its “scheduled” field is true.

Kerberos and Integrated Authentication

When integrated security should be used to connect to context items datasource, Kerberos must be configured and used by tm-context-sql-sync.

Create Kerberos configuration file named krb5.conf

Example:

[libdefaults]

default_realm = YYYY.CORP.CONTOSO.COM

dns_lookup_realm = false

dns_lookup_kdc = true

ticket_lifetime = 24h

forwardable = yes

[domain_realm]

.yyyy.corp.contoso.com = YYYY.CORP.CONTOSO.COM

.zzzz.corp.contoso.com = ZZZZ.CORP.CONTOSO.COM

[realms]

YYYY.CORP.CONTOSO.COM = {

kdc = krbtgt/YYYY.CORP. CONTOSO.COM @ YYYY.CORP. CONTOSO.COM

default_domain = YYYY.CORP. CONTOSO.COM

}

ZZZZ.CORP. CONTOSO.COM = {

kdc = krbtgt/ZZZZ.CORP. CONTOSO.COM @ ZZZZ.CORP. CONTOSO.COM

default_domain = ZZZZ.CORP. CONTOSO.COM

}

Copy krb5.conf to Trendminer appliance folder, i.e. /mnt/data/fileserve
In Consul’s key/value store define property “java.security.krb5.conf” for tm-context-sql-sync service. Set it’s value to “./krb5.conf”
Configure datasource as follow:

{
       "name": "MSSQL_IntegratedSecurity",
       "jdbcUrl": "jdbc:sqlserver://tm-
mssql21.yyyy.corp.contoso.com:1433;Integratedsecurity=true;authenticationScheme=JavaKerberos;databaseName=tm_context",
       "dbType": "MSSQL", 
       "username": "user_name@YYYY.CORP.CONTOSO.COM",
       "password": "******"
}

Important Notes

Important

The access token will expire after 12 hours. Best practice is to request an access token before every API call.

Only SQL Server 2008 and later are supported. Earlier versions of MS SQL may give the following error when executing the SQL query: “Cannot find data type datetimeoffset.”

If a query is used by an active sync job, the query can not be updated.

Examples

In the examples bellow, we will use the following tm_contextitems table as context items data source:

Setup historical sync with positional parameters:

Query definition - here a positional parameter criteria is applied on a datetime field:

{
"name": "histSyncQuery1",
"supportsParameters": true,
"supportsNamedParameters": false,
"componentLookupStrategy": "TAGS",
"timeZone": "UTC",
"fieldNames": " V_DLM_AVG, V_OPA_AVG",
"sqlString": "SELECT id, componentId, product_code as type, 'Context item description' as description, 'START' as start_event, startDateTime, 'END' as stop_event, endDatetime, V_DLM_AVG, V_OPA_AVG
FROM   tm_contextitems where ? > last_modified_date and last_modified_date <= ?;",
"typeMapping": [
{
"B100610": "PRODUCTION"
}
] 
}

The historical sync job can be defined as follow:

{
    "datasourceId": "1",
    "queryId": "1",
    "externalIdPrefix": "sync1",
    "chunkSize": 10,
    "startDate": "2020-06-24 00:00:00",
    "endDate": "2020-06-25 00:00:00"               
}

Setup historical sync with named parameters:

Query definition - here a named parameter criteria is applied on a datetime field:

{
"name": "histSyncQuery2",
"supportsParameters": false,
"supportsNamedParameters": true,
"componentLookupStrategy": "TAGS",
"timeZone": "UTC",
"fieldNames": " V_DLM_AVG, V_OPA_AVG",
"sqlString": "SELECT id, componentId, product_code as type, 'Context item description' as description, 'START' as start_event, startDateTime, 'END' as stop_event, endDatetime, V_DLM_AVG, V_OPA_AVG
FROM   tm_contextitems where :startDate > last_modified_date and last_modified_date <= :endDate;",
"typeMapping": [
{
"B100610": "PRODUCTION"
}
]
}

The historical sync job can be defined as follow:

  {
    "datasourceId": "1",
    "queryId": "1",
    "externalIdPrefix": "sync2",
    "chunkSize": 10,
    "startDate": "2020-06-24 00:00:00",
    "endDate": "2020-06-25 00:00:00"    
  }

Setup historical sync without parameters:

Query definition - in this case no parameters are used as query criteria:

{
"name": "histSyncQuery3",
"componentLookupStrategy": "TAGS",
"timeZone": "UTC",
"fieldNames": " V_DLM_AVG, V_OPA_AVG",
"sqlString": "SELECT id, componentId, product_code as type, 'Context item description' as description, 'START' as start_event, startDateTime, 'END' as stop_event, endDatetime, V_DLM_AVG, V_OPA_AVG
FROM   tm_contextitems where  last_modified_date > '2020-06-24 00:00:00'",
"typeMapping": [
{
"B100610": "PRODUCTION"
}
]
}

In such case no parameters are needed for historical sync job definition:

  {
    "datasourceId": "1",
    "queryId": "1",
    "externalIdPrefix": "sync3",
    "chunkSize": 10
  }

Setup sync job

Query definition - sync job always provides time interval, so in the query either positional or named parameters can be provided.

{
"name": "syncJobQuery1",
"supportsParameters": false,
"supportsNamedParameters": true,
"componentLookupStrategy": "TAGS",
"timeZone": "UTC",
"fieldNames": " V_DLM_AVG, V_OPA_AVG",
"sqlString": "SELECT id, componentId, product_code as type, 'Context item desrtiption' as description, 'START' as start_event, startDateTime, 'END' as stop_event, endDatetime, V_DLM_AVG, V_OPA_AVG
FROM   tm_contextitems where :startDate > last_modified_date and last_modified_date <= :endDate;",
"typeMapping": [
{
"B100610": "PRODUCTION"
}
]
}

But in some cases, the query used by the sync job may not require parameters, even though an interval is provided by sync job. In such case query can be defined without parameters and different criteria can be used in the sql:

{
"name": "syncJobQuery2",
"componentLookupStrategy": "TAGS",
"timeZone": "UTC",
"fieldNames": " V_DLM_AVG, V_OPA_AVG",
"sqlString": "SELECT id, componentId, product_code as type, 'Context item description' as description, 'START' as start_event, startDateTime, 'END' as stop_event, endDatetime, V_DLM_AVG, V_OPA_AVG
FROM   tm_contextitems where  last_modified_date < GETDATE();",
"typeMapping": [
{
"B100610": "PRODUCTION"
}
]
}

In both cases, sync job definition is similar:

{
  "datasourceId": "1",
  "queryId": "1",
  "externalIdPrefix": "prefix",
  "syncJobInterval": "10",
  "chunkSize": 10
}

Example

Step	Visual	Detail
Configure Custom Fields for Context Item Types		Configure in ConfigHub on an admin account
Create and Context Item Types (include custom fields)		Configure in ConfigHub on an admin account
Create SQL Table		Configure in ConfigHub on an admin account Create SQL Table Tables must include id, componentType, type, description, startEvent, startTime, endEvent, and endTime. After that, custom fields can be added as additional columns. Note that in your case, “componentid”, “componenttype”, and “type” column will all likely be the same values for batch sheets. Componentid is the tag the context item will be associated with, componenttype will likely be TAG, and the type will be the name of the Context Item type you created (ex. Batch Sheet)
Define the data source	Use the TrendMiner API console to obtain data-sourced for sync in later steps:
Define the Query		The rows from id to endTime are required, and additional custom fields can be added after. startDate and endDate will be defined in the syncs. Test the query within your environment to verify functionality prior to posting via API.
Historical Sync		datasourceId and queryId are found using the GET method strings shown in previous steps. Locate the number of the desired database and query. Prefix is not required, but can be used to collect context items with the same name from different data sources (ex. using batch id to label different context item types). Note We recommend inputting a chunksize of 1440
Define Sync Job		This sync job will sync with the SQL database every 10 minutes.
Live Sync Job		This will schedule ongoing syncs of the defined database at the defined syncJobInterval. This example assumes that sync job id is 14.