Configuring the stand-alone ETL process
After you install the Interact stand-alone ETL process, you need to configure the process by modifying files on the ETL process server, and in the Marketing Platform configuration pages.
To configure the ETL process, there are files in the Interact home directory on the ETL process server, to indicate where the necessary Java runtime files are found, as well as other environment variables. Then you need to connect to the IBM® Marketing Platform server associated with this installation and use the configuration pages there to set up the necessary properties to run the ETL process.
1.
<Interact_home>\PatternStateETL\bin\setenv.bat on Microsoft Windows, or <Interact_home>\PatternStateETL\bin\setenv.sh on a UNIX-like operating system.
a.
Complete the line that reads set JAVA_HOME=[CHANGE ME], changing [CHANGE ME] to the actual path to the 64-bit Java runtime you want to use.
*
Although the IBM® EMM installer provides a Java runtime in <Interact_home>\..\jre, such as C:\IBM\EMM\jre, this is a 32-bit Java runtime used only for installation. This runtime is not suitable for running the ETL process. Install a supported 64-bit Java runtime if one is not already installed, and update the setenv file to use that runtime.
b.
Complete the line that reads set JDBCDRIVER_CP= with the actual location of the JDBC driver for the connection to the database containing the system tables.
For example, if you were connecting to an Oracle database, you might specify the path to a local copy of ojdbc6.jar.
2.
In a supported web browser, connect to the IBM® Marketing Platform server associated with this installation and log in using administer-level credentials.
3.
Open the Configuration page by clicking Settings > Configuration in the toolbar.
The Configuration page shows the Configuration Categories tree.
4.
Navigate to interact | ETL in the Configuration Categories tree.
5.
Click PatternStateETLConfig in the tree to create a new Pattern State ETL configuration.
In the right pane, complete the following information:
*
New category name. Provide a name that uniquely identifies this configuration. Note that you must provide this exact name when you run the stand-alone ETL process. For convenience in specifying this name on the command line, you may want to avoid a name containing spaces or punctuation, such as ETLProfile1.
*
runOnceADay. Determines whether the stand-alone ETL process in this configuration should run once each day. Valid answers are Yes or No. If you answer No here, the processSleepIntervalInMinutes determines the run schedule for the process.
*
preferredStartTime. The preferred time at which the stand-alone ETL process should start. Specify the time in the format HH:MM:SS AM/PM, as in 01:00:00 AM.
*
preferredEndTime. The preferred time at which the stand-alone ETL process should stop. Specify the time in the format HH:MM:SS AM/PM, as in 08:00:00 AM.
*
processSleepIntervalInMinutes. If you have not configured the stand-alone ETL process to run once a day (as specified in the runOnceADay property), this property specifies the interval between ETL process runs. For example, if you specify 15 here, the stand-alone ETL process will wait for 15 minutes after it stops running before starting the process again.
*
maxJDBCInsertBatchSize. The maximum number of records of a JDBC batch before committing the query. By default, this is set to 5000. Note that this is not the maximum number of records that the ETL processes in one iteration. During each iteration, the ETL processes all available records from the UACI_EVENTPATTERNSTATE table. However, all those records are broken into maxJDBCInsertSize chunks.
*
maxJDBCFetchBatchSize. The maximum number of records of a JDBC batch to fetch from the staging database.
You may need to increase this value to tune the performance of the ETL.
*
communicationPort. The network port on which the standalone ETL process listens for a stop request. Under normal circumstances, there should be no reason to change this from the default value.
*
queueLength. A value used for performance tuning. Collections of pattern state data are fetched and transformed into objects that are added to a queue to be processed and written to the database. This property controls the size of the queue.
*
completionNotificationScript. Specifies the absolute path to a script to run when the ETL process is completed. If you specify a script, three arguments are passed to the completion notification script: start time, end time, and total number of event pattern records processed. The start time and end time are numeric values representing number of milliseconds elapsed since 1970.
6.
When you save the configuration, two additional categories are automatically created in the tree underneath the new configuration: RuntimeDS, and TargetDS. Use these categories to specify the data source where the stand-alone ETL process should retrieve the data it will process (the database containing the Interact runtime tables), and the data source where the results will be stored.
7.
Configure the Interact | ETL | patternStateETL | <patternStateETLName> | RuntimeDS and Interact | ETL | patternStateETL | <patternStateETLName> | TargetDS categories for the ETL configuration.
The two categories determine the data sources for retrieving and storing the event pattern data used by the ETL process.
*
The data source that you specify for the TargetDS configuration may be the same data source in which the Interact runtime tables are stored, or it may be a different data source for performance reasons.
a.
Click the category (RuntimeDS or TargetDS) that you want to configure.
b.
In the right pane, click Edit Settings and complete the following fields:
*
type. A list of the supported database types for the data source you are defining.
*
dsname. The JNDI name of the data source. This name must also be used in the user's data source configuration to ensure that the user has access to the target and runtime data sources.
*
driver. The name of the JDBC driver to us, such as any of the following:
Oracle: oracle.jdbc.OracleDriver
Microsoft SQL Server: com.microsoft.sqlserver.jdbc.SQLServerDriver
IBM DB2: com.ibm.db2.jcc.DB2Driver
*
serverUrl. The data source URL, such as any of the following:
Oracle: jdbc:oracle:thin:@ <your_db_host>:<your_db_port>:<your_db_service_name>
Microsoft SQL Server: jdbc:sqlserver:// <your_db_host>:<your_db_port> ;databaseName= <your_db_name>
IBM DB2: jdbc:db2:// <your_db_host>:<your_db_port>/<your_db_name>
*
connectionpoolSize. A value indicating the size of the connection pool, provided for performance tuning. Pattern state data is read and transformed concurrently depending upon the available database connections. Increasing the connection pool size allows for more concurrent database connections, subject to limitations of memory and database read/write capabilities. For example, if this value is set to 4, four jobs will run concurrently. If you have a large amount of data, you might need to increase this value to a number such as 10 or 20, as long as sufficient memory and database performance is available.
*
schema. The name of the database schema to which this configuration is connecting.
*
connectionRetryPeriod. The ConnectionRetryPeriod property specifies the amount of time in seconds Interact automatically retries the database connection request on failure. Interact automatically tries to reconnect to the database for this length of time before reporting a database error or failure. If the value is set to 0, Interact retries indefinitely; if the value is set to -1, no retry is attempted.
*
connectionRetryDelay. The ConnectionRetryDelay property specifies the amount of time in seconds Interact waits before it tries to reconnect to the database after a failure. If the value is set to -1, no retry is attempted.
Save the changes when you are done specifying both the runtime and target data sources.
8.
Still in IBM® Marketing Platform server, click Settings > Users in the toolbar.
9.
10.
Define data sources for the user to match the TargetDS and RuntimeDS categories you just defined for the ETL category. The Data Source name you specify for the user data source must match the value of the dsname property for TargetDS or RuntimeDS configuration.
The event pattern state ETL read the user name and password you specify here to connect to the database during processing.
You have now configured the Marketing Platform for use with the event pattern ETL process. Be aware that any changes you make to the ETL configuration other than the communication port are automatically implemented in the next run of the ETL process. There is no need to restart the ETL process after you change the configuration, unless you specify a new communication port.
With the installation and configuration of the event pattern ETL process completed, you are now ready to run the process.