Prepare for the Platform Analytics installation

Select the hosts that meet the detailed system requirements, prepare the Platform LSF cluster, and download the installation packages for Platform Analytics.

What you need to do

  1. Select the database hosts
  2. Select the Platform Analytics server host
  3. Select the Platform Analytics reporting server host
  4. Select the Platform Analytics node hosts
  5. Prepare the Platform LSF clusters (Platform LSF 7.x, 8.0, and 8.0.1 only)
  6. Obtain the necessary installation files

Select the database hosts

If you already have a database host (running Oracle) that you plan to use with Platform Analytics, you need to ensure that Platform Analytics supports your version of the Oracle database, and that there is appropriate space in the database allocated for Platform Analytics. Refer to the Release Notes for Platform Platform Analytics for the latest list of supported Oracle database versions.

For optimal performance of your production database, the Vertica database cluster should consist of at least three dedicated multi-core hosts running on a high-bandwidth network. Since the Vertica database needs to share a large volume of data among the database nodes in the database cluster during data loading or data querying, network bandwidth is an important performance bottleneck for a production database. Therefore, the Vertica database cluster should have a Gigabit Ethernet connection with the Platform Analytics reporting server and the Platform Analytics node hosts.

Vertica recommends a 1-10 GB full duplex switch for the private network interface and a VLAN or separate switch for the public network. The switch used for the private network should have sufficient bandwidth to enable 1 GB transfer speeds between any pair of nodes.

The hardware requirements are the same for all the intended database hosts. Refer to the Vertica documentation, or the Release Notes for Platform Platform Analytics for the latest list of system requirements and supported operating systems for the Vertica database hosts.

Input/output (I/O) performance is important for the operation of a database while fault tolerance is important to safeguard your data. Using a RAID 01 or 10 system enables the database host to take advantage of data striping and data mirroring. Data striping allows data to be transferred to multiple hard disks concurrently, which improves input/output (I/O) performance. Data mirroring means that your database does not lose data even if a hard disk fails.

The following table describes the optimal configuration of the database depending on the size of your cluster. The specific hardware recommendations for each database host are the same:


Cluster size

Number of hosts

RAM

CPU

Local hard disk

Network

Medium

(100 - 1000 hosts)

3

16 GB

4 × 2.4GHz

10000 RPM SATA/SCSI/SAS/SSD

RAID 01 or 10

300 GB

Gigabit Ethernet

Large

(more than 1000 hosts)

more than 3

32 GB

8 × 2.4GHz

10000 RPM

SATA/SCSI/SAS/SSD

RAID 01 or 10

1 TB

Gigabit Ethernet


Data striping

Data striping is the technique of segmenting logically-sequential data, such as a single file, so that the database can assign segments to multiple physical devices (usually disk drives for RAID storage, or network interfaces for grid-oriented storage) in a round-robin fashion and thus be read or written concurrently.

Automatic data striping is available in certain RAID devices under software or hardware control. Oracle Automatic Storage Management (ASM) allows ASM files to be either coarse- or fine-striped. You can also achieve data striping with Logical Volume Management (LVM) in Linux.

Automatic data striping is available in certain RAID devices under software or hardware control, and in file systems of clusters. The following parameters are important when improving I/O performance:

Stripe width

The number of parallel stripes that can be written to or read from simultaneously. This is the number of disks in the RAID system, and as it increases, the read/write performance of striped data also increases.

Stripe size

The size of the stripes written to each disk. This may also be referred to as block size, chunk size, stripe length, or granularity.

You should use a large stripe size of at least 1 MB.

If you are using RAID devices, you should use RAID 10 or RAID 01, because it offers the best performance of all RAID systems and good fault tolerance.

Select the Platform Analytics server host

When selecting a host to be the Platform Analytics server, you need to ensure that the host is running a supported operating system. Refer to the Release Notes for Platform Platform Analytics for the latest list of supported operating systems for the Platform Analytics server host.

Tip:

If you select a host that also meets the Tableau Server system requirements, you can also select the Platform Analytics server host to be the Platform Analytics reporting server host.

For optimal performance, the Platform Analytics server host should be a dedicated multi-core host with sufficient memory and input/output performance. The network bandwidth between the Platform Analytics server, the database hosts, and the Platform Analytics nodes is a key performance factor in the Platform Analytics server host.

If you are not using asynchronous data loading mode, the following hardware configuration should be sufficient:


RAM

CPU

Local hard disk

Network

4 GB

4 × 2.4 GHz

7200 RPM

SATA/SCSI/SAS

50 GB

Gigabit Ethernet


If you are using asynchronous data loading mode, memory is a key performance factor for the Platform Analytics server host. If the Platform Analytics server is running on a Windows host, you should use the 64-bit version of Windows because Java cannot use more than 1638MB of memory on 32-bit platforms.

You should only use the asynchronous data loading mode for sending data from the Platform Analytics node to the database over a slow or unstable network.

The following table describes the optimal hardware configuration of the Platform Analytics server if you are using asynchronous data loading, depending on the size of your cluster:


Cluster size

RAM

CPU

Local hard disk

Network

Medium

(100 - 1000 hosts)

4 GB

4 × 2.4GHz

7200 RPM

SATA/SCSI/SAS

50 GB

Gigabit Ethernet

Large

(more than 1000 hosts)

8 GB

4 × 2.4GHz

7200 RPM

SATA/SCSI/SAS

50 GB

Gigabit Ethernet


Note:

Using an NFS disk mount instead of a local hard disk is not recommended.

Select the Platform Analytics reporting server host

When selecting a host to be the Platform Analytics reporting server host, you need to ensure that the host meets the detailed system requirements for Tableau Server:


Operating system

RAM

CPU

Services

User accounts

Windows Server 2003 (SP2 or higher)

Windows Server 2008

Windows Server 2008 R2

2 GB

Dual-core

Do not run Internet Information Services (IIS) to avoid conflicts with the web server port 80.

Access to an administrator account to install software and services.

Access to a user account that the service can use (optional).


Tip:

If the Platform Analytics server host also meets the Tableau Server system requirements, you can select the Platform Analytics server host to also be the Platform Analytics reporting server host.

Refer to the Tableau Server documentation, or to the Release Notes for Platform Platform Analytics for the latest list of system requirements and supported operating systems for the Tableau Server.

The network bandwidth between the Platform Analytics reporting host and the database cluster may be an important performance bottleneck. Therefore, the Platform Analytics reporting host should have a Gigabit Ethernet connection with database hosts.

The following table describes the optimal hardware configuration of the Platform Analytics reporting server, depending on the size of your cluster:


Cluster size

RAM

CPU

Local hard disk

Network

Medium

(100 - 1000 hosts)

4 GB

4 × 2.4GHz

7200 RPM

SATA/SCSI/SAS

50 GB

Gigabit Ethernet

Large

(more than 1000 hosts)

8 GB

4 × 2.4GHz

7200 RPM

SATA/SCSI/SAS

50 GB

Gigabit Ethernet


Select the Platform Analytics node hosts

When selecting a host in the LSF clusters to be an Platform Analytics node, you need to ensure that the host is running a supported operating system, and that it meets the minimum hardware requirements. Refer to the Release Notes for Platform Platform Analytics for the latest system requirements for the Platform Analytics node host.

For optimal performance of your Platform Analytics node, the host should be running on a high-bandwidth network. Since network bandwidth is an important performance bottleneck for the Platform Analytics nodes, the Platform Analytics node host should have a Gigabit Ethernet connection with the database host. If the Platform Analytics node is running on a Windows host, you should use the 64-bit version of Windows because Java cannot use more than 1638MB of memory on 32-bit platforms.

The following table describes the optimal hardware configuration of the Platform Analytics node depending on the size of the clusters in which the node resides:


Cluster size

RAM

CPU

Local hard disk

Network

Medium

(100 - 1000 hosts)

4 GB

2 × 2.4GHz

7200 RPM

SATA/SCSI/SAS

50 GB

Gigabit Ethernet

Large

(more than 1000 hosts)

8 GB

4 × 2.4GHz

7200 RPM

SATA/SCSI/SAS

50 GB

Gigabit Ethernet


Prepare the Platform LSF clusters (Platform LSF 7.x, 8.0, and 8.0.1 only)

Your clusters must be running one of the following:

  • Platform LSF 7.x

  • Platform LSF 8.0

  • Platform LSF 8.0.1

You can skip this step for any clusters that are not running any of these versions of Platform LSF.

By default, Platform LSF do not enable the lsb.stream file for the exporting of LSF job event data.

If you want the Platform Analytics node to collect LSF cluster data from your Platform LSF cluster, you need to enable the lsb.stream file because Platform Analytics requires this file for the data loaders to obtain job data.

  1. Log into a host in the Platform LSF cluster.
  2. Edit the lsb.params file.
    • UNIX: $LSF_ENVDIR/lsbatch/cluster_name/configdir/lsb.params

    • Windows: %LSF_ENVDIR%\lsbatch\cluster_name\configdir\lsb.params

  3. In the lsb.params file, edit the Parameters section to enable the exporting of LSF job event data to the lsb.stream file.

    Add the following lines to the Parameters section:

    # Enable streaming of lsbatch system events
    ENABLE_EVENT_STREAM=y
    # Determines the location of the lsb.stream file. This parameter is optional.
    # The default location is: $LSB_SHAREDIR/{clustername}/logdir/stream.
    # EVENT_STREAM_FILE=/tmp/lsb.mystream
    # Determines the maximum size of the lsb.stream file. This parameter is optional.
    # The default size is 100MB.
    # MAX_EVENT_STREAM_SIZE=10000
  4. Reconfigure mbatchd to apply these changes.

    badmin mbdrestart

  5. To verify that these changes are in effect, verify that the lsb.stream files exists.

    By default, lsb.stream is located at the following directories:

    • UNIX: $LSB_SHAREDIR/cluster_name/logdir/stream

    • Windows: %LSB_SHAREDIR%\cluster_name\logdir\stream

    If you defined the EVENT_STREAM_FILE parameter in lsb.params, check the specified file path for the lsb.stream file.

Obtain the necessary installation files

  1. Obtain the necessary files for installing Platform Platform Analytics.

    You need the following files to install Platform Platform Analytics:

    • Platform Platform Analytics server installation package

    • Platform Platform Analytics node installation package

    • Platform Platform Analytics documentation package

    • Platform Platform Analytics license file

    • Platform 

  2. Obtain the necessary files for installing the Vertica Analytic Database.

    You need the following files to install the Vertica Analytic Database:

    • Vertica Analytic Database installation package

    • Platform Platform Analytics database schema package

  3. Obtain the necessary files for installing the Tableau Server.

    You need the following files to install the Tableau Server:

    • Tableau Server installation pacakage

    • Platform Platform Analytics report package

  4. Obtain the necessary file for integrating Platform Analytics into Platform Application Center.

    README for Integrating Platform Analytics 8.0.2 into Platform Application Center 8.0.2.