Deep Learning software packages

PowerAI release 3.4 provides software packages for several Deep Learning frameworks, supporting libraries, and tools:

Bazel
Caffe - BVLC, IBM, and NVIDIA variants
Chainer
DIGITS
NCCL
OpenBLAS
TensorFlow
Theano
Torch

All the packages are intended for use with Ubuntu 16.04 on POWER with NVIDIA CUDA 8.0 and cuDNN v5.1 packages.

More information about PowerAI is available at https://ibm.biz/powerai. Developer resources can be found at http://ibm.biz/poweraideveloper

System set up

Operating System

The Deep Learning packages require Ubuntu 16.04 for IBM POWER8. Ubuntu installation images can be downloaded from:

http://www.ubuntu.com/download/server/power8

NOTE: After installing Ubuntu 16.04 update the libc6 package to version 2.23-0ubuntu5 or higher. That version fixes problems affecting Torch and TensorFlow.

NOTE: PowerAI Release 3.4 requires the version 4.4 linux kernel. Ubuntu 16.04 supports two different kernel versions: the base kernel (version 4.4), and the Hardware Enablement kernel (currently version 4.8; see https://wiki.ubuntu.com/Kernel/RollingLTSEnablementStack). Be sure to install the base 4.4 kernel for PowerAI.

NVIDIA components

The Deep Learning packages require NVIDIA CUDA 8.0 and cuDNN 5.1, which can be installed as follows:

Download and install NVIDIA CUDA 8.0 from https://developer.nvidia.com/cuda-downloads
- Select Operating System: Linux
- Select Architecture: ppc64le
- Select Distribution Ubuntu
- Select Version 16.04
- Select the Installer Type that best fits your needs
- Follow the Linux installation instructions in the CUDA Quick Start Guide (linked from https://developer.nvidia.com/cuda-downloads), including the steps describing how to set up the CUDA development environment by updating PATH and LD_LIBRARY_PATH.
Download NVIDIA cuDNN 5.1 for CUDA 8.0 Power8 Deb packages from https://developer.nvidia.com/cudnn (Registration in NVIDIA's Accelerated Computing Developer Program is required)
- cuDNN v5.1 Runtime Library for Ubuntu16.04 Power8 (Deb)
- cuDNN v5.1 Developer Library for Ubuntu16.04 Power8 (Deb)
- cuDNN v5.1 Code Samples and User Guide Power8 (Deb)
Install the cuDNN v5.1 packages
```
   $ sudo dpkg -i libcudnn5*deb
```

The required and recommended versions of these components are:

CUDA toolkit: 8.0.44 or higher required; 8.0.61 recommended
NVIDIA driver: 361.93.03 or higher required; 361.121 recommended
cuDNN: 5.1 or higher required; 5.1.10 recommended

NOTE: PowerAI Release 3.4 requires a version 361 GPU driver. Version 375 GPU drivers are available for download from NVIDIA (see below) but are not yet tested or supported with PowerAI.

NVIDIA GPU driver update

NVIDIA driver updates for POWER8 are available from https://www.nvidia.com (select DRIVERS then All NVIDIA Drivers).

Installing the Deep Learning Frameworks

Software repository Setup

The PowerAI Deep Learning packages are provided via two different installation methods:

The local repository package (mldl-repo-local) creates an installation repository on the local machine. This method is best for systems with limited internet access or where strong control of upgrades is desired.
The network repository package (mldl-repo-nework) creates a reference on the local machine to the PowerAI network repository. This method is best for internet connected systems that will be readily updated to new versions of PowerAI.

These packages are mutually exclusive. Choose one or the other for your systems.

Software repository setup is similar for either method:

Download the desired repository package (.deb file) from https://public.dhe.ibm.com/software/server/POWER/Linux/mldl/ubuntu/
Install the repository package:
```
   $ sudo dpkg -i mldl-repo-*.deb
```
Update the package cache
```
   $ sudo apt-get update
```

Installing all frameworks at once

All the Deep Learning frameworks can be installed at once using the power-mldl meta-package:

    $ sudo apt-get install power-mldl

Installing frameworks individually

The Deep Learning frameworks can be installed individually if preferred. The framework packages are:

caffe-bvlc - Berkeley Vision and Learning Center (BVLC) upstream Caffe, v1.0.0rc5
caffe-ibm - IBM Optimized version of BVLC Caffe, v1.0.0rc3
caffe-nv - NVIDIA fork of Caffe, v0.15.14 and v0.14.5
chainer - Chainer, v1.20.0.1
digits - DIGITS, v5.0.0
tensorflow - Google TensorFlow, v1.0.1 and v0.12.0
theano - Theano, v0.9.0
torch - Torch, v7

Each can be installed with:

    $ sudo apt-get install <framework>

Installation note for DIGITS

The digits and python-socketio-server packages conflict with Ubuntu's older python-socketio package. Please uninstall the python-socketio package before installing DIGITS.

Upgrading from a previous release

To upgrade from a previous release:

Install the new repository package
- To use the local repository package, simply install the new local repository package over the old:
```
  $ sudo dpkg -i mldl-repo-local_3.4.1_ppc64el.deb
```
- To use the network repository package, first uninstall the old local repository package then install the new network repository package:
```
  $ sudo apt-get purge mldl-repo-local
  $ sudo dpkg -i mldl-repo-network_3.4.0_ppc64el.deb
```
Update the repository meta-data
```
   $ sudo apt-get update
```
Upgrade the packages, as desired:
- To update all packages on the system, including PowerAI and others:
```
  $ sudo aptitude dist-upgrade
```
- To update all PowerAI packages (assuming the power-mldl meta-package was installed):
```
  $ sudo aptitude upgrade power-mldl
```
- To update a single PowerAI package, for example TensorFlow:
```
  $ sudo aptitude upgrade tensorflow
```

Tuning recommendations

Recommended settings for optimal Deep Learning performance on the S822LC for High Performance Computing are:

Enable Performance Governor

   $ sudo apt-get install linux-tools-common linux-tools-generic linux-cloud-tools-generic cpufrequtils lsb-release
   $ sudo cpupower -c all frequency-set -g performance

Enable GPU persistence mode

Use nvidia-persistenced (http://docs.nvidia.com/deploy/driver-persistence/index.html) or
```
   $ sudo nvidia-smi -pm ENABLED
```
Set GPU memory and graphics clocks
```
   $ sudo nvidia-smi -ac 715,1480
```
For TensorFlow, set the SMT mode
```
   $ sudo ppc64_cpu --smt=2
```

Getting started with MLDL Frameworks

General setup

Each framework package provides a shell script to simplify environmental setup.

We recommend that users update their shell rc file (e.g. .bashrc) to source the desired setup scripts. For example:

    source /opt/DL/<framework>/bin/<framework>-activate

Each frame also provides a test script to verify basic function:

    $ <framework>-test

Note about python setuptools / easy_install

The python easy_install utility may interfere with the proper function of some of the PowerAI framework packages, including TensorFlow and Chainer.

The PowerAI packages include local copies of python modules such as protobuf (TensorFlow) and pillow (Chainer) because they require versions newer than those provided by Canonical / Ubuntu. The <framework>-activate scripts set up the pathing needed to make that work (they set PYTHONPATH to give the local copies priority over the system default versions).

easy_install adds a script that may cause the system's default paths to be searched ahead of PYTHONPATH entries. This may result in protobuf or pillow related failures in TensorFlow and Chainer.

If easy-install is run as root, the problematic script may be found in:

    /usr/local/lib/python2.7/dist-packages/easy-install.pth

Getting started with Caffe

Caffe alternatives

Packages are provided for upstream BVLC Caffe (/opt/DL/caffe-bvlc), IBM optimized BVLC Caffe (/opt/DL/caffe-ibm), and NVIDIA's Caffe (/opt/DL/caffe-nv). The system default Caffe (/opt/DL/caffe) can be selected using Ubuntu's alternatives system:

    $ sudo update-alternatives --config caffe
    There are 3 choices for the alternative caffe (providing /opt/DL/caffe).

      Selection    Path                Priority   Status
    ------------------------------------------------------------
    * 0            /opt/DL/caffe-ibm    100       auto mode
      1            /opt/DL/caffe-bvlc   50        manual mode
      2            /opt/DL/caffe-ibm    100       manual mode
      3            /opt/DL/caffe-nv     75        manual mode

    Press <enter> to keep the current choice[*], or type selection number:

Users can activate the system default caffe:

    source /opt/DL/caffe/bin/caffe-activate

Or they can activate a specific variant. For example:

    source /opt/DL/caffe-bvlc/bin/caffe-activate

Attempting to activate multiple Caffe packages in a single login session will cause unpredictable behavior.

Caffe samples and examples

Each Caffe package includes example scripts and sample models, etc. A script is provided to copy the sample content into a specified directory:

    $ caffe-install-samples <somedir>

Optimizations in IBM Caffe

The IBM Caffe package (caffe-ibm) in PowerAI is based on upstream BVLC Caffe commit b2982c7 (https://github.com/BVLC/caffe/tree/b2982c7eef65a1b94db6f22fb8bb7caa986e6f29).

Optimization

Our optimizations aim to reduce the running time of a multiple-GPU training by utilizing CPUs. In particular, gradient accumulation is offloaded to CPUs and done in parallel with the training. To gain the best performance with IBM Caffe, please close unnecessary applications that consume a high percentage of CPU.

If using a single GPU, IBM Caffe and BVLC Caffe will have similar performance.

The optimizations in IBM Caffe do not change the convergence of a neural network during training. IBM Caffe and BVLC Caffe should produce the same convergence results.

Command line options

IBM Caffe has the same set of options as BVLC Caffe except two additional ones:

bvlc: This option disables optimizations by IBM and runs as the original BVLC Caffe
threshold: If the number of parameters for one layer are greater than or equal to threshold, their accumulation on CPU will be done in parallel. Otherwise, the accumulation will be done using one thread. It is set to 2,000,000 by default.

Verifying the performance of IBM Caffe

This section shows how to compare IBM Caffe and BVLC Caffe in training Alexnet with a batch size 256 per GPU.

Make a local copy of the Caffe samples and examples

   $ source /opt/DL/caffe/bin/caffe-activate
   $ caffe-install-samples $HOME/caffe       # directory of your choosing

Prepare the Imagenet (ILSVRC) 2012 dataset in LMDB format as described at:
- http://caffe.berkeleyvision.org/gathered/examples/imagenet.html
- See also in the local examples copy (e.g. $HOME/caffe):
  - examples/imagenet/readme.md
  - examples/imagenet/create_imagenet.sh
Update the local copy of the sample Alexnet model
- In $HOME/caffe, edit models/train_val.prototxt
- Update the dataset locations to specify where you placed the ILSVRC2012 LMDB dataset
- Update the mean_file and source parameters in the data layer for both phase: TRAIN and phase: TEST

Get results for IBM Caffe

Open a new login session

Activate IBM Caffe

  $ source /opt/DL/caffe-ibm/bin/caffe-activate

Train Alexnet with 4 GPUs: 0,1,2,3

  $ cd $HOME/caffe
  $ caffe train -solver models/bvlc_alexnet/solver.prototxt -gpu 0,1,2,3

Get results for BVLC Caffe

Open a new login session

Activate BVLC Caffe

  $ source /opt/DL/caffe-bvlc/bin/caffe-activate

Train Alexnet with 4 GPUs: 0,1,2,3

  $ cd $HOME/caffe
  $ caffe train -solver models/bvlc_alexnet/solver.prototxt -gpu 0,1,2,3

NVIDIA Caffe versions

This PowerAI release includes packages for both NV Caffe 0.14.5 and 0.15.14. The versions may behave differently (e.g. in performance or convergence) with different models.

NV Caffe 0.15.14 is NCCL enabled and will be installed by default. Version 0.14.5 can be installed as follows:

Uninstall 0.15.14 (if needed)

   $ sudo apt-get purge caffe-nv
   ...
   The following packages were automatically installed and are no longer required:
     bazel caffe-bvlc caffe-ibm chainer digits libnccl1 libopenblas tensorflow theano
     torch
   Use 'sudo apt autoremove' to remove them.
   The following packages will be REMOVED:
     caffe-nv* power-mldl*
   ...
   Do you want to continue? [Y/n]

Install 0.14.15 version specifically

   $ sudo apt-get install caffe-nv=0.14.5-3ibm1

More info

Visit Caffe's website (http://caffe.berkeleyvision.org/) for tutorials and example programs that you can run to get started.

Here are links to a couple of the example programs:

LeNet MNIST Tutorial - Train a neural network to understand handwritten digits
CIFAR-10 tutorial - Train a convolutional neural network to classify small images

Getting started with Chainer

The Chainer home page at http://chainer.org/ includes documentation for the Chainer project, including a Quick Start example.

Getting started with Tensorflow

The TensorFlow homepage (https://www.tensorflow.org/) has a variety of information, including Tutorials, How Tos, and a Getting Started guide.

Additional tutorials and examples are available from the community, for example:

API changes and sample models

Note that the TensorFlow API is updated in version 1.0, so programs written for earlier versions of TensorFlow may need to be updated. The TensorFlow v1.0.0 release notes describe the changes and link to a conversion tool. See: https://github.com/tensorflow/tensorflow/releases/tag/v1.0.0

The TensorFlow team provides example models on GitHub at https://github.com/tensorflow/models Some of the example models may not be updated for the new API.

For the inception/imagenet_train example:

For Tensorflow 1.0.0

Commit ef84162c from fork repo https://github.com/ibmsoe/tensorflow-models (i.e. branch inception-imagenet-1.0) should work.
For TensorFlow 0.12.0

Commit 91c7b91f from upstream repo https://github.com/tensorflow/models should work. The example will print may API warnings as it starts up, but should run to completion.

Additional features

The PowerAI TensorFlow packages include TensorBoard. See: https://www.tensorflow.org/get_started/summaries_and_tensorboard

The TensorFlow 1.0.1 package includes support for additional features:

HDFS
NCCL
experimental XLA JIT compilation (see https://www.tensorflow.org/versions/master/experimental/xla/)

TensorFlow versions

This PowerAI release includes packages for both TensorFlow version 1.0.1 and 0.12. The versions may behave differently (e.g. in performance or convergence) with different models.

TensorFlow version 1.0.1 will be installed by default. TensorFlow 0.12 can be installed as follows:

If the PowerAI packages are not yet installed you can specify TensorFlow 0.12 during the initial install:
```
    $ sudo apt-get install tensorflow=0.12.0-3ibm1 power-mldl
```

If TensorFlow 1.0.1 is already installed, then we recommend to uninstall it first before installing 0.12:

    $ sudo apt-get purge tensorflow
    ...
    The following packages were automatically installed and are no longer required:
      bazel caffe-bvlc caffe-ibm caffe-nv chainer digits libopenblas python-engineio
      python-flask-socketio python-lmdb python-mako python-scikit-fmm python-socketio-server
      theano torch
    ...
    The following packages will be REMOVED:
      power-mldl* tensorflow*
    ...
    Do you want to continue? [Y/n] y
    ...
    Removing power-mldl (3.4.1) ...
    Removing tensorflow (1.0.1-3ibm1) ...

    $ sudo apt-get install tensorflow=0.12.0-3ibm1 power-mldl

Getting started with Torch

The Torch Cheatsheet contains lots of info for people new to Torch, including tutorials and examples.

The Torch project has a demos repository at https://github.com/torch/demos

Tutorials can be found at https://github.com/torch/tutorials

Visit Torch's website for the latest from Torch.

Torch samples and examples

The Torch package includes example scripts and samples models. A script is provided to copy the sample content into a specified directory:

    $ torch-install-samples <somedir>

Among these are the Imagenet examples from https://github.com/soumith/imagenet-multiGPU.torch with a few modifications.

Extending Torch with additional Lua rocks

The Torch package includes several Lua rocks useful for creating Deep Learning applications. Additional Lua rocks can be installed locally to extend functionality. For example a rock providing NCCL bindings can be installed by:

    $ source /opt/DL/torch/bin/torch-activate
    $ source /opt/DL/nccl/bin/nccl-activate

    $ luarocks install --local --deps-mode=all "https://raw.githubusercontent.com/ngimel/nccl.torch/master/nccl-scm-1.rockspec"
    ...
    nccl scm-1 is now built and installed in /home/user/.luarocks/ (license: BSD)

    $ luajit
    LuaJIT 2.1.0-beta1 -- Copyright (C) 2005-2015 Mike Pall. http://luajit.org/
    JIT: OFF
    > require 'torch'
    > require 'nccl'
    >

Getting started with Theano

Here are some links to help you get started with Theano:

Visit Theano's website for the latest from Theano.

Theano 0.9.0 deprecates support for the old GPU backend (e.g. THEANO_FLAGS=device=gpu) and adds support for the gpuarray backend (e.g. THEANO_FLAGS=device=cuda0). The old GPU backend will likely be removed in a future Theano update.

Getting started with DIGITS

The first time it's run digits-activate will create a .digits subdirectory containing the DIGITS jobs directory, as well as the digits.log file

Multiple instances of the DIGITS server can be run at once, including by different users, but users may need to set the network port number to avoid conflicts.

To start DIGITS server with default port (5000):

    $ digits-devserver

To start DIGITS server with specific port

    $ digits-devserver -p <port_num>

NVIDIA's DIGITS site has more information about DIGITS.

The DIGITS Getting Started guide describes how to train a network model to classify the MNIST hand-written digits dataset.

Additional DIGITS examples are available at https://github.com/NVIDIA/DIGITS/tree/master/examples

The PowerAI Torch package is updated to work with DIGITS. Manual installation of individual lua rocks is no longer required.

Installing DIGITS plugins

The DIGITS package supports the use of python installed plugins to provide additional features when using the DIGITS server page. These plugins are included in the PowerAI distribution and can be installed in one of two ways.

Install Plugins under the current user

    $ pip install /opt/DL/digits/plugins/data/imageGradients
    $ pip install /opt/DL/digits/plugins/view/imageGradients

Install Plugins for all users

    $ sudo pip install /opt/DL/digits/plugins/data/imageGradients
    $ sudo pip install /opt/DL/digits/plugins/view/imageGradients

Examples using DIGITS plugins can be found in DIGITS examples folder

Legal Notices

IBM, the IBM logo, ibm.com, POWER, Power, POWER8, and Power systems are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml.

Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.

The TensorFlow package includes code from the BoringSSL project. The following notices may apply:

    This product includes software developed by the OpenSSL Project for
    use in the OpenSSL Toolkit. (http://www.openssl.org/)

    This product includes cryptographic software written by Eric Young
    (eay@cryptsoft.com)

This document is current as of the initial date of publication and may be changed by IBM at any time. Not all offerings are available in every country in which IBM operates.

THE INFORMATION IN THIS DOCUMENT IS PROVIDED "AS IS" WITHOUT ANY WARRANTY, EXPRESS OR IMPLIED, INCLUDING WITHOUT ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND ANY WARRANTY OR CONDITION OF NON-INFRINGEMENT. IBM products are warranted according to the terms and conditions of the agreements under which they are provided.