If you are running Content Platform Engine 5.2.1.3-P8CPE-FP003 or later, a Hadoop storage device can be used in an advanced storage area.
In preparation of creating a Hadoop storage device on Content Platform Engine, the following security requirements must be met.
- Content Platform Engine requires the Apache Knox Gateway 0.7.0 release to connect to a Hadoop cluster. The Knox Gateway provides a single point of authentication and access to Hadoop services in a cluster. Content Platform Engine communicates with the Knox Gateway over the WebHDFS REST API. The connection configuration between the Knox Gateway and the Haddoop cluster is irrelevant to Content Platform Engine.
- Knox Gateway is configured with SSL enabled by default. Therefore, you must install the Knox Gateway certificate on each Content Platform Engine server that accesses the Hadoop cluster.
- The Knox Gateway must be configured to use BASIC authentication with an authentication provider. Typically, an LDAP directory service is used. The Content Platform Engine Hadoop user account is the account that is used to authenticate with the Knox Gateway authentication provider. The Knox Gateway configuration determines how the Content Platform Engine authentication user is mapped to the Hadoop cluster user. If the configuration changes on the Knox Gateway, that might result in the Content Platform Engine user account to be mapped to a different Hadoop cluster user.
- Analytic tools that access the content directly must access the content as a super user or as the Content Platform Engine user. The analytic tools have access to all the content. Content Platform Engine-enforced access control does not apply to content stored on Hadoop storage devices.
- For analytic tools to process the content on Hadoop storage devices, the content cannot be compressed or encrypted by Content Platform Engine.