Deep Learning software packages

IBM provides software packages for several Deep Learning frameworks, supporting libraries, and tools:

Caffe - BVLC, IBM, and NVIDIA variants
TensorFlow
Torch
Theano
OpenBLAS
NCCL
DIGITS

All the packages are intended for use with Ubuntu 16.04 on POWER with NVIDIA CUDA 8.0 and cuDNN v5.1 packages.

For more information visit https://ibm.biz/powerai.

System set up

Operating System

The Deep Learning packages require Ubuntu 16.04 for IBM POWER8. Ubuntu installation images can be downloaded from:

http://www.ubuntu.com/download/server/power8

NOTE: After installing Ubuntu 16.04 update the libc6 package to version 2.23-0ubuntu5 or higher. That version fixes problems affecting Torch and TensorFlow. You may need to enable the updates repository to install this update.

NVIDIA components

The Deep Learning packages require NVIDIA CUDA 8.0 and cuDNN 5.1, which can be installed as follows:

Download and install NVIDIA CUDA 8.0 from https://developer.nvidia.com/cuda-downloads-power8
- This PowerAI release was tested with CUDA 8.0.44. That version or newer should work.
- Follow the Linux installation instructions in the CUDA Quick Start Guide (linked from https://developer.nvidia.com/cuda-downloads), including the steps describing how to set up the CUDA development environment by updating PATH and LD_LIBRARY_PATH.
Download NVIDIA cuDNN 5.1 for CUDA 8.0 Power8 Deb packages from https://developer.nvidia.com/cudnn (Registration in NVIDIA's Accelerated Computing Developer Program is required)
- cuDNN v5.1 Runtime Library for Ubuntu16.04 Power8 (Deb)
- cuDNN v5.1 Developer Library for Ubuntu16.04 Power8 (Deb)
- cuDNN v5.1 Code Samples and User Guide Power8 (Deb)
Install the cuDNN v5.1 packages
```
   $ sudo dpkg -i libcudnn5*deb
```

NVIDIA GPU driver update

The Deep Learning packages will work with the version 361 GPU driver that ships with CUDA 8 but IBM recommends using driver 361.93.03 or higher.

NVIDIA driver updates for POWER8 are available from https://www.nvidia.com (select DRIVERS then All NVIDIA Drivers).

Installing the Deep Learning Frameworks

Software repository Setup

The Deep Learning packages are published as an Ubuntu package that sets up an installation repository on the local machine. The repository can be enabled as follows:

Download the latest mldl-repo-local .deb file from https://download.boulder.ibm.com/ibmdl/pub/software/server/mldl/
Install the repository package:
```
   $ sudo dpkg -i mldl-repo-local*.deb
```
Update the package cache
```
   $ sudo apt-get update
```

Installing all frameworks at once

All the Deep Learning frameworks can be installed at once using the power-mldl meta-package:

    $ sudo apt-get install power-mldl

Installing frameworks individually

The Deep Learning frameworks can be installed individually if preferred. The framework packages are:

caffe-bvlc - Berkeley Vision and Learning Center (BVLC) upstream Caffe, v1.0.0rc3
caffe-ibm - IBM Optimized version of BVLC Caffe, v1.0.0rc3
caffe-nv - NVIDIA fork of Caffe, v0.15.13
tensorflow - Google TensorFlow, v0.9.0
theano - Theano, v0.8.2
torch - Torch, v7
digits - DIGITS, v5.0.0-rc.1

Each can be installed with:

    $ sudo apt-get install <framework>

Installation note for DIGITS

The digits and python-socketio-server packages conflict with Ubuntu's older python-socketio package. Please uninstall the python-socketio package before installing DIGITS.

Tuning recommendations

Recommended settings for optimal Deep Learning performance on the S822LC for High Performance Computing are:

Enable Performance Governor

   $ sudo apt-get install linux-tools-common cpufrequtils
   $ sudo cpupower -c all frequency-set -g performance

Enable GPU persistence mode

Use nvidia-persistenced (http://docs.nvidia.com/deploy/driver-persistence/index.html) or
```
   $ sudo nvidia-smi -pm ENABLED
```
Set GPU memory and graphics clocks
```
   $ sudo nvidia-smi -ac 715,1480
```
For TensorFlow, set the SMT mode
```
   $ sudo ppc64_64 --smt=2
```

Getting started with MLDL Frameworks

General setup

Each framework package provides a shell script to simplify environmental setup.

We recommend that users update their shell rc file (e.g. .bashrc) to source the desired setup scripts. For example:

    source /opt/DL/<framework>/bin/<framework>-activate

Each frame also provides a test script to verify basic function:

    $ <framework>-test

Getting started with Caffe

Caffe alternatives

Packages are provided for upstream BVLC Caffe (/opt/DL/caffe-bvlc), IBM optimized BVLC Caffe (/opt/DL/caffe-ibm), and NVIDIA's Caffe (/opt/DL/caffe-nv). The system default Caffe (/opt/DL/caffe) can be selected using Ubuntu's alternatives system:

    $ sudo update-alternatives --config caffe
    There are 3 choices for the alternative caffe (providing /opt/DL/caffe).

      Selection    Path                Priority   Status
    ------------------------------------------------------------
    * 0            /opt/DL/caffe-ibm   100        auto mode
      1            /opt/DL/caffe-bvlc   50        manual mode
      1            /opt/DL/caffe-ibm   100        manual mode
      2            /opt/DL/caffe-nv     75        manual mode

    Press enter to keep the current choice[*], or type selection number:

Users can activate the system default caffe:

    source /opt/DL/caffe/bin/caffe-activate

Or they can activate a specific variant. For example:

    source /opt/DL/caffe-bvlc/bin/caffe-activate

Attempting to activate multiple Caffe packages in a single login session will cause unpredictable behavior.

Caffe samples and examples

Each Caffe package includes example scripts and sample models, etc. A script is provided to copy the sample content into a specified directory:

    $ caffe-install-samples <somedir>

More info

Visit Caffe's website (http://caffe.berkeleyvision.org/) for tutorials and example programs that you can run to get started.

Here are links to a couple of the example programs:

LeNet MNIST Tutorial - Train a neural network to understand handwritten digits
CIFAR-10 tutorial - Train a convolutional neural network to classify small images

Getting started with Tensorflow

The TensorFlow homepage (https://www.tensorflow.org/) has a variety of information, including Tutorials, How Tos, and a Getting Started guide.

Additional tutorials and examples are available from the community, for example:

The TensorFlow team provides ready-to-use models on GitHub at https://github.com/tensorflow/models

Note: The latest model code is incompatible with TensorFlow v0.9.0 due to a change in the tensorflow.image.resize_images() API (see commit 1e16f10). Model code from before that change should work with this TensorFlow package (for example, commit a9133ae).

Note: The TensorFlow package includes code from the BoringSSL project. The following notices may apply:

    This product includes software developed by the OpenSSL Project for
    use in the OpenSSL Toolkit. (http://www.openssl.org/)

    This product includes cryptographic software written by Eric Young
    (eay@cryptsoft.com)

Getting started with Torch

The Torch Cheatsheet contains lots of info for people new to Torch, including tutorials and examples.

The Torch project has a demos repository at https://github.com/torch/demos

Tutorials can be found at https://github.com/torch/tutorials

Visit Torch's website for the latest from Torch.

Torch samples and examples

The Torch package includes example scripts and samples models. A script is provided to copy the sample content into a specified directory:

    $ torch-install-samples <somedir>

Among these are the Imagenet examples from https://github.com/soumith/imagenet-multiGPU.torch with a few modifications.

Extending Torch with additional Lua rocks

The Torch package includes several Lua rocks useful for creating Deep Learning applications. Additional Lua rocks can be installed locally to extend functionality. For example a rock providing NCCL bindings can be installed by:

    $ source /opt/DL/torch/bin/torch-activate
    $ source /opt/DL/nccl/bin/nccl-activate

    $ luarocks install --local --deps-mode=all "https://raw.githubusercontent.com/ngimel/nccl.torch/master/nccl-scm-1.rockspec"
    ...
    nccl scm-1 is now built and installed in /home/user/.luarocks/ (license: BSD)

    $ luajit
    LuaJIT 2.1.0-beta1 -- Copyright (C) 2005-2015 Mike Pall. http://luajit.org/
    JIT: OFF
    > require 'torch'
    > require 'nccl'
    >

Getting started with Theano

Here are some links to help you get started with Theano:

Visit Theano's website for the latest from Theano.

Getting started with DIGITS

The first time it's run digits-activate will create a .digits subdirectory containing the DIGITS jobs directory, as well as the digits.log file

Multiple instances of the DIGITS server can be run at once, including by different users, but users may need to set the network port number to avoid conflicts.

To start DIGITS server with default port (5000):

    $ digits-devserver

To start DIGITS server with specific port

    $ digits-devserver -p <port_num>

NVIDIA's DIGITS site has more information about DIGITS.

The DIGITS Getting Started guide describes how to train a network model to classify the MNIST hand-written digits dataset.

Additional DIGITS examples are available at https://github.com/NVIDIA/DIGITS/tree/master/examples

NOTE: This DIGITS package will not work as is with Torch. Torch support needs additional pre-requisite packages. Those packages are not supplied with the PowerAI distribution at this time.