Optimize the Platform Analytics server

Modify the Platform Analytics server to enhance performance.

What you need to do

The following is optional.

  1. Change the data retention period
  2. Split the data transformer tasks to disperse workload

Change the data retention period

A long data retention period can have a significant impact on performance and the data volume. You can tailor the data retention period according to your business requirements to maximize the performance of your Platform Analytics server.

The data purger consists of multiple scheduled tasks (PartitionMaintenanceGroup*), which are enabled by default.

  1. Launch the vsql command line.
    1. Navigate to the bin subdirectory of the Vertica installation directory.

      By default, this is /opt/vertica/bin.

    2. Run vsql to connect to the database.

      ./vsql ‑d database_name ‑p port ‑U username ‑w password

      where

      • database_name is the name of the database

      • port is the TCP port number or the local socket file extension in which the server is listening for connections. The default is port number 5433.

      • username is the name of the user with which to connect to the database, instead of the default user (the database administrator).

      • password is the password for the database user.

      Alternately, you can run vsql with no options to accept the defaults and specify the administrator password at the prompt.

  2. Examine the current data retention periods of the database tables for Platform Analytics.
    • To examine the retention periods for all Platform Analytics database tables, run the following from the vsql command line:

      SELECT TABLE_NAME, DATA_DAYS_RANGE

      FROM SYS_TABLES_TO_PARTITION;

    • To examine the retention periods for a specific database table, run the following from the vsql command line:

      SELECT TABLE_NAME, DATA_DAYS_RANGE

      FROM SYS_TABLES_TO_PARTITION

      WHERE TABLE_NAME='table_name';

      where table_name is the name of the table you want to examine.

    The output displays the name of the table and the corresponding data retention period in days.

    For example, to view the data retention period for the RESOURCE_METRICS_BUILTIN table, run the following from the vsql command line:

    SELECT TABLE_NAME, DATA_DAYS_RANGE

    FROM SYS_TABLES_TO_PARTITION

    WHERE TABLE_NAME='RESOURCE_METRICS_BUILTIN';

  3. Change the data retention period for the appropriate database tables.
    1. For each database table to change, run the following from the vsql command line:

      UPDATE SYS_TABLES_TO_PARTITION

      SET DATA_DAYS_RANGE='retention_period'

      WHERE TABLE_NAME=’table_name’;

      where

      • retention_period is the new retention period, in days

      • table_name is the name of the table you are changing

    2. Commit the changes to the database.

      Run the following from the vsql command line:

      COMMIT;

    For example, to change the data retention period of the RESOURCE_METRICS_BUILTIN table to 2192 days, run the following from the vsql command line:

    UPDATE SYS_TABLES_TO_PARTITION

    SET DATA_DAYS_RANGE='2192'

    WHERE TABLE_NAME='RESOURCE_METRICS_BUILTIN';

    COMMIT;

Split the data transformer tasks to disperse workload

By default, there are two default scheduled tasks that control data transformers. Four scheduled tasks might not be enough to be able to run all the data transformers within one hour, so to enhance performance, you can split these data transformers into more tasks.

The following is a recommended format for splitting your data transformers into five tasks. The examples will make use of this table (specifically, with Task 1).


Task

Data transformer name

Data flow entry

1

ClusterCapacity

main_cluster_capacity.xml

2

WorkloadStatistics

main_workload_statistics.xml

3

FlexlmLicenseUsage

main_flexlm_licusage.xml

4

Hardware

main_hardware.xml

5

Jobmart

main_jobmart.xml


  1. Log into the Platform Analytics server host.
  2. Create and enable a new scheduled task in the Platform Analytics Console.
    1. Launch the Platform Analytics Console.
      • UNIX: ANALYTICS_TOP/bin/runconsole.sh

      • Windows: Start > Programs > Platform Analytics Server > Platform Analytics Console

    2. Click Scheduled Tasks in the navigation tree.
    3. Right-click on the main window and select Add Scheduled Task.
    4. Complete the required fields for the new task.
      • Scheduled Task: Specify the name of this task.

      • Script File: Specify bin/dataagghourly.js for hourly tasks or bin/dataaggdaily.js for daily tasks.

      • Script Function: Specify doit.

      For example, if you are creating Task 1from the table with the recommended format of splitting data transformers, specify the following:

      • Scheduled Task: Specify Task1 as the name of the scheduled task.

      • Script File: Specify bin/dataagghourly.js as the path to the script file.

      • Script Function: Specify doit as the script function.

    5. Enable the new scheduled task that you created.
  3. In the tasks subdirectory of ANALYTICS_TOP, create a new directory with the same name as the name of the new scheduled task and navigate to the new directory.

    For example, for Task 1 on a UNIX host,

    cd ANALYTICS_TOP/tasks

    mkdir Task1

    cd Task1

  4. From the new directory, create a text file of any name with the .tsk extension.

    For example, create task1.tsk.

  5. In the new .tsk text file, for each data transformer that you would like the scheduled task to control, add its corresponding data flow entry as a new file to the line.

    You can also add a comment with the name of the data transformer if you start the line with the # character.

    For example, for Task 1, the task1.tsk file should contain the following lines:

    # Cluster Capacity
    datatransformer/flow/clustercapacity/main_cluster_capacity.xml