Select the hosts that meet the detailed system requirements, prepare the Platform LSF cluster, and download the installation packages for Platform Analytics.
If you already have a database host (running Oracle) that you plan to use with Platform Analytics, you need to ensure that Platform Analytics supports your version of the Oracle database, and that there is appropriate space in the database allocated for Platform Analytics. Refer to the Release Notes for Platform Platform Analytics for the latest list of supported Oracle database versions.
For optimal performance of your production database, the Vertica database cluster should consist of at least three dedicated multi-core hosts running on a high-bandwidth network. Since the Vertica database needs to share a large volume of data among the database nodes in the database cluster during data loading or data querying, network bandwidth is an important performance bottleneck for a production database. Therefore, the Vertica database cluster should have a Gigabit Ethernet connection with the Platform Analytics reporting server and the Platform Analytics node hosts.
Vertica recommends a 1-10 GB full duplex switch for the private network interface and a VLAN or separate switch for the public network. The switch used for the private network should have sufficient bandwidth to enable 1 GB transfer speeds between any pair of nodes.
The hardware requirements are the same for all the intended database hosts. Refer to the Vertica documentation, or the Release Notes for Platform Platform Analytics for the latest list of system requirements and supported operating systems for the Vertica database hosts.
Input/output (I/O) performance is important for the operation of a database while fault tolerance is important to safeguard your data. Using a RAID 01 or 10 system enables the database host to take advantage of data striping and data mirroring. Data striping allows data to be transferred to multiple hard disks concurrently, which improves input/output (I/O) performance. Data mirroring means that your database does not lose data even if a hard disk fails.
The following table describes the optimal configuration of the database depending on the size of your cluster. The specific hardware recommendations for each database host are the same:
Data striping is the technique of segmenting logically-sequential data, such as a single file, so that the database can assign segments to multiple physical devices (usually disk drives for RAID storage, or network interfaces for grid-oriented storage) in a round-robin fashion and thus be read or written concurrently.
Automatic data striping is available in certain RAID devices under software or hardware control. Oracle Automatic Storage Management (ASM) allows ASM files to be either coarse- or fine-striped. You can also achieve data striping with Logical Volume Management (LVM) in Linux.
Automatic data striping is available in certain RAID devices under software or hardware control, and in file systems of clusters. The following parameters are important when improving I/O performance:
The number of parallel stripes that can be written to or read from simultaneously. This is the number of disks in the RAID system, and as it increases, the read/write performance of striped data also increases.
The size of the stripes written to each disk. This may also be referred to as block size, chunk size, stripe length, or granularity.
If you are using RAID devices, you should use RAID 10 or RAID 01, because it offers the best performance of all RAID systems and good fault tolerance.
When selecting a host to be the Platform Analytics server, you need to ensure that the host is running a supported operating system. Refer to the Release Notes for Platform Platform Analytics for the latest list of supported operating systems for the Platform Analytics server host.
For optimal performance, the Platform Analytics server host should be a dedicated multi-core host with sufficient memory and input/output performance. The network bandwidth between the Platform Analytics server, the database hosts, and the Platform Analytics nodes is a key performance factor in the Platform Analytics server host.
If you are not using asynchronous data loading mode, the following hardware configuration should be sufficient:
If you are using asynchronous data loading mode, memory is a key performance factor for the Platform Analytics server host. If the Platform Analytics server is running on a Windows host, you should use the 64-bit version of Windows because Java cannot use more than 1638MB of memory on 32-bit platforms.
You should only use the asynchronous data loading mode for sending data from the Platform Analytics node to the database over a slow or unstable network.
The following table describes the optimal hardware configuration of the Platform Analytics server if you are using asynchronous data loading, depending on the size of your cluster:
When selecting a host to be the Platform Analytics reporting server host, you need to ensure that the host meets the detailed system requirements for Tableau Server:
Refer to the Tableau Server documentation, or to the Release Notes for Platform Platform Analytics for the latest list of system requirements and supported operating systems for the Tableau Server.
The network bandwidth between the Platform Analytics reporting host and the database cluster may be an important performance bottleneck. Therefore, the Platform Analytics reporting host should have a Gigabit Ethernet connection with database hosts.
The following table describes the optimal hardware configuration of the Platform Analytics reporting server, depending on the size of your cluster:
When selecting a host in the LSF clusters to be an Platform Analytics node, you need to ensure that the host is running a supported operating system, and that it meets the minimum hardware requirements. Refer to the Release Notes for Platform Platform Analytics for the latest system requirements for the Platform Analytics node host.
For optimal performance of your Platform Analytics node, the host should be running on a high-bandwidth network. Since network bandwidth is an important performance bottleneck for the Platform Analytics nodes, the Platform Analytics node host should have a Gigabit Ethernet connection with the database host. If the Platform Analytics node is running on a Windows host, you should use the 64-bit version of Windows because Java cannot use more than 1638MB of memory on 32-bit platforms.
The following table describes the optimal hardware configuration of the Platform Analytics node depending on the size of the clusters in which the node resides:
Your clusters must be running one of the following:
You can skip this step for any clusters that are not running any of these versions of Platform LSF.
By default, Platform LSF do not enable the lsb.stream file for the exporting of LSF job event data.
If you want the Platform Analytics node to collect LSF cluster data from your Platform LSF cluster, you need to enable the lsb.stream file because Platform Analytics requires this file for the data loaders to obtain job data.