Release Notes for the Advance Toolchain 3.0 Version 3.0-0

Features:

  o The Advance Toolchain is a self contained toolchain which isn't
    reliant on the base system toolchain.
  o Decimal Floating Point support in the following packages:
    - GCC-4.4.4-ibm-r160160 [C,C++ (g++), fortran]
    - GNU Binutils-2.20.51-20100526
    - GLIBC-2.11-ibm
    - LIBDFP 1.0.3
    - GDB-7.1
    - GMP-4.2.4
  o Releases of the following packages in support of the Advance Toolchain:
    - Valgrind-3.4.1
    - OProfile-0.9.6 with Java (1.5 or later) support
    - MPFR-2.4.1
  o Power6 enablement.
  o Power6 Optimized scheduler.
  o Power6 Native DFP instruction support.
  o Power6 VMX enablement with auto-vector.
  o Power7 enablement.
  o Power7 Optimized scheduler.
  o Power7 Native DFP instruction support.
  o Power7 VMX/VSX enablement with auto-vector.
  o ppc970,POWER4,POWER5,POWER5+,POWER6, POWER6x and POWER7 optimized system libraries.
  o Libhugetlbfs 1.0 support.
  
What's new (apart from package versions listed above):

  o Binaries are all 64-bit.
  o The compiler now defaults to 64-bit.
  o Better Thread Local Storage support.
  o Several POWER7-specific optimizations for GCC and GLIBC.

Documentation for each component can be found at:

  o GCC: http://gcc.gnu.org/onlinedocs/gcc-4.4.4/gcc/
  o Binutils: http://sourceware.org/binutils/docs/
  o GLIBC: http://www.gnu.org/software/libc/manual/html_node/index.html
  o LIBDFP: http://www.eglibc.org/cgi-bin/viewcvs.cgi/*checkout*/libdfp/trunk/README.user?rev=10733l
  o GDB: http://sourceware.org/gdb/current/onlinedocs/gdb/
  o GMP: http://gmplib.org/manual/
  o MPFR: http://www.mpfr.org/mpfr-current/mpfr.html
  o Valgrind: http://valgrind.org/docs/manual/manual.html
  o OProfile: http://oprofile.sourceforge.net/doc/

Support:

  Customer support for the Advance Toolchain (AT) is provided in one of three ways:

  1.If you are using AT as directed by an IBM product team (ex.: IBM XL Compiler or PowerVM Lx86)
  please report suspected AT problems to IBM Support using that product name and entitlement.

  2.IBM's Support Line for Linux Offering in the United States now provides support for AT as well.
  If you have a contract with a U.S. Support Line for Linux contract, place a call to IBM Support:
    o 1-800-426-IBM-SERV
    o Option #2 (Other business products or solutions)
    o Option #2 (Software)
    o Option #7 (Other OS/Linux)

  3.All other users can use an electronic forum that is monitored Monday
  through Friday - for questions regarding the use of or to report a suspected defect in AT, go to:
  http://www-128.ibm.com/developerworks/forums/forum.jspa?forumID=1518 
    o Open the Advance Toolchain topic.
    o Select 'Post a New Reply'
    o Enter and submit your question or problem
    o An initial response will be attempted within 2 business days

Installation:

  The gpg public key gpg-pubkey-00f50ac5-45e497dc will be provided in the
  repository where these release notes were found.  This pubkey can be used to
  verify the authenticity of both the Advance Toolchain rpms and the
  repository contents.

  Download this gpg-pubkey and import it into your rpm database using
  the following:

  rpm --import gpg-pubkey-00f50ac5-45e497dc
   
  Note: on SLES10, please install zlib, ncurses and python packages (64-bit)
  before installing the Advance Toolchain.

  YaST:
    To install execute 'yast' as root and select 'Add-on Product'.

    Select the FTP Protocol:
      (x) FTP...
    Under "Server Name":
      linuxpatch.ncsa.uiuc.edu
    Under "Directory on Server":
      
      /toolchain/at/at3.0/suse/SLES_10
   
    You will get a warning about there being no product information available
    at the given location.  This is because the repomd based repository
    doesn't contain the YaST product information.  This is not a bug.  Select
    [Continue].

    Under the "Software Management" interface search for "advance toolchain"
    and mark the runtime and devel versions for installation as necessary and
    click [Accept].  
  

  Note: If you're installing the rpms by hand you will need to install the
  rpms in the following order due to prerequisites:

    advance-toolchain-runtime-3.0-0
    advance-toolchain-devel-3.0-0
    advance-toolchain-perf-3.0-0

  A batch install should install them in the correct order, i.e.

    rpm -i advance-toolchain*

  Note on timezone files: If you need to use something different than Factory,
  then you should copy the timezone file you want from /opt/at3.0/share/zoneinfo 
  to /opt/at3.0/etc/localtime.

Usage:

  The Advance Toolchain currently provides Decimal Floating Point compiler,
  library, debugger and performance analysis support in a standalone
  toolchain.

  In order to be able to reference DFP defined symbols (including constants
  and functions) ones source code must define the following:

    #define __STDC_WANT_DEC_FP__

  Or one may build the source with the following define flag:
    -D__STDC_WANT_DEC_FP__

  GNU99 compatibility is required to pick up some DFP prototypes.  It will
  define __USE_ISOC99.  Use the following compilation flag: -std=gnu99

  NOTE: -std=gnu99 IS NOT THE SAME AS __USE_ISOC99 though -std=gnu99 DOES
  DEFINE __USE_ISOC99!  Additionally, simply using -std=c99 isn't enough!

  NOTE: If you forget to use -std=gnu99 you may notice that you will get very
  screwy results when you call dfp math functions.  If the compiler can't find
  the prototypes (due to missing defines) it will attempt to create a default
  prototype which will have the incorrect return type.

  Compile with -Wall to pick up undefined prototype warnings.

  The following include files provide the constants and function prototypes
  provided by DFP but only when __STDC_WANT_DEC_FP__ is defined.

    /* fe_dec_getround(), fe_dec_setround(), and rounding mode
     * enumeration types provided by an implicit #include
     * <bits/dfpfenv.h>.  */
     #include <fenv.h>

    /* All math function prototypes for d32/d64/d128, polymorphic
     * classification macros, comparison macros, and DEC_NAN and
     * DEC_INFINITY macros.  This includes an implicit #include
     * <bits/dfpcalls.h> to pick up new DFP only prototypes defined
     * in include/bits/dfpcalls.h.  */
     #include <math.h>

     /* Type dependent floating point macros for DFP.  */
     #include <float.h>

  The Decimal Floating Point types are as follows:

    _Decimal32
    _Decimal64
    _Decimal128

  The printf length modifiers follow:

    %H - for _Decimal32
    %D - for _Decimal64
    %DD - for _Decimal128

  The scanf length modifiers follow:

    %H - for _Decimal32
    %D - for _Decimal64
    %DD - for _Decimal128

  The floating point suffix for DFP constants follows:

    'DF' for _Decimal32, e.g. _Decimal32 d32 = 1.045DF;
    'DD' for _Decimal64, e.g. _Decimal64 d64 = 1.4738273DD;
    'DL' for _Decimal128, e.g. _Decimal128 d128 = 1.0823382394823945DL;

  NOTE: Assigning a naked constant to a DFP variable will actually be
  performing a binary to decimal conversion and, depending on the precision,
  can assign an incorrect number.  Always use the decimal floating point
  suffix!

  A compilation and link for a DFP program will look like the following:

    /opt/at3.0/bin/gcc -Wall test_dfp.c -o dfp -D__STDC_WANT_DEC_FP__ -std=gnu99 -ldfp -ldecnumber

Unsupported/Non-Standard Additions:

  Libdfp provides a non-standard method for output of the decoded Densely
  Packed Decimal representation using the decoded[32|64|128]() functions.  The
  output format is:

    [sign][MSD],[decoded-declet-1],...,[decoded-declet-n][E][+|-][decoded exponent]

  Examples:

    +0,000,000E+0 = decoded32(0DF)
    +0,000,000,000,001,000E-1 = decoded64(100.0DD)
    -0,000,000,000,000,000,000,000,000,039,654,003E-3 = decoded128(-39654.003DL)
    +9,876,543E+22 = decoded32(9.876543E+28DF)

  WARNING:  Do NOT rely on these methods for user space code.  They're only
  provided for toolchain development debug support and will not be exported
  from Libdfp for the release.

  A header file providing the prototype for these functions is not provided by
  the Advance Toolchain.  In order to use them define the following prototypes
  in your program:

    /* char * should ref a 14 byte char array, +0,000,000E+0\0  */
    extern char * decoded32 (_Decimal32, char*);
    /* char * should ref a 26 byte char array, +0,000,000,000,000,000E+0\0  */
    extern char * decoded64 (_Decimal64, char*);
    /* char * should ref a 50 byte char array, * +0,000,000,000,000,000,000,000,000,000,000,000E+0\0  */
    extern char * decoded128 (_Decimal128, char*);

Limitations:

  o Libdfp uses libdecnumber's decimal[32|64|128]ToString functions for
    printf support.  These functions take formatting into their own hands.
    You'll notice some interesting features.  It will use Engineering format
    on its own sometimes.  Another interesting feature is that when you do the
    following:

      _Decimal32 d32 = 0.0DF;
      printf("%Hf\n", d32);

    You'll notice that libdecnumber prints out:

      0

    The Advance Toolchain has provided limited support for the precision and
    width format codes: e.g.

      _Decimal32 d32 = 1.12315DF;
      printf("'%12.4Hf'\n",d32);
      '1.1232      '
      printf("'%-12.4Hf'\n",d32);
      '      1.1232'
      printf("'%2.5Hf'\n",d32);
      '1.12315'

    The printf function can be used to reduce the precision of a _Decimal*
    value using _Decimal to string to _Decimal conversions but in general
    the quantized[32|64|128]() function is a better choice.

  o IEEE754r currently has an addendum awaiting vote whereby the default
    quantum for conversions involving zero will go to a zero exponent (e.g.
    0 equals 0.0).  The current IEEE754r specification dictates that the
    quantum shall go to the largest supported by the data type, e.g.
    _Decimal32 0.0E191; _Decimal64 0.0E767, _Decimal128 0.0E12287.

    Observation of the advance toolchain results will show that we don't
    follow any particular convention.  This may change in the future.

    For the following examples notice the DPD encoding on both power6[x] and
    non-power6:

      _Decimal32 d32 = 0.0DF;
      _Decimal64 d64 = 0.0DD;
      _Decimal128 d128 = 0.0DL;

      (_Decimal128)0.0DF: [+0,000,000E+0]
      (_Decimal128)0.0DD: [+0,000,000,000,000,000E+0]
      (_Decimal128)0.0DL: [+0,000,000,000,000,000,000,000,000,000,000,000E+0]

    On power6[x] notice the representation of zero after an [int|long|long
    long] conversion to _Decimal[32|64|128] respectively:

      (_Decimal32)0DF = (int)0: [+0,000,000E+0]
      (_Decimal32)0.0DF = (float)0.000000: [+0,000,000E+0]
      (_Decimal64)0DD = (long)0: [+0,000,000,000,000,000E+0]
      (_Decimal64)0.0DD = (double)0.000000: [+0,000,000,000,000,000E+0]
      (_Decimal128)0DL = (long long)0: [+0,000,000,000,000,000,000,000,000,000,000,000E+0]
      (_Decimal128)0.0DL = (long double)0.000000: [+0,000,000,000,000,000,000,000,000,000,000,000E+0]

    Notice the difference on non-Power6:

      (_Decimal32)0.0DF = (int)0: [+0,000,000E-1]
      (_Decimal32)0.0DF = (float)0.000000: [+0,000,000E+0]
      (_Decimal64)0.0DD = (long)0: [+0,000,000,000,000,000E-1]
      (_Decimal64)0.0DD = (double)0.000000: [+0,000,000,000,000,000E+0]
      (_Decimal128)0.0DL = (long long)0: [+0,000,000,000,000,000,000,000,000,000,000,000E-1]
      (_Decimal128)0.0DL = (long double)0.000000: [+0,000,000,000,000,000,000,000,000,000,000,000E+0]

    Namely the negative sign of the exponent on non-power6 for int to _Decimal conversions.

  o The scanf() function mostly works, except when more digits than the
  precision are given, and it does not preserve the input quantum.
  
  o Valgrind currently doesn't support POWER7-specific instructions.
  
  o OProfile currently doesn't support profiling using JVMPI.
  
  o OProfile Java profiling restriction:  To use OProfile for profiling Java
  VMs 1.5 or greater, users generally have two options available when
  invoking the JVM:  -agentlib or -agentpath.  However, when using the
  Advance Toolchain's OProfile, you must use
  "-agentpath:/opt/at3.0/lib64/oprofile/libjvmti_oprofile.so" for 64-bit
  JVMs or "agentpath:/opt/at3.0/lib/oprofile/libjvmti_oprofile.so" for
  32-bit JVMs.

Optimization Selection:

  Directing GCC to build an application for a particular cpu can take
  advantage of processor specific instruction selection.  In some cases it
  case significantly improve performance.  Building without selecting a
  particular cpu simply causes GCC to select the default (lowest common
  denominator) instruction set.

    -mcpu=power4
    -mcpu=970
    -mcpu=power5
    -mcpu=power5+
    -mcpu=power6
    -mcpu=power6x
    -mcpu=power7

  * Note: when using -mcpu=power7, do not disable Altivec (i.e. -mno-altivec)
  without also disabling VSX (i.e. -mno-vsx). The combination:

  -mcpu=power7 -mno-altivec

  is illegal.

Debugging:
  GDB can be asked to output _Decimal[32|64|128] formatted floating point
  registers by default, using GDB's 'printf' command.

  When using 'objdump' to inspect POWER6 code, make sure to use the '-Mpower6'
  flag (e.g. objdump -Mpower6 your_file). The same applies to POWER7 codes
  (e.g. objdump -d -Mpower7 your_file).

Relinking a pre-built application with the Advance Toolchain:

  1.) Locate all of the application's .o files. You can also link .a files to
      pick them all up at once. These will be needed for the relink.

  2.) Locate the paths to all of the necessary linked shared-object files,
      e.g.

    /usr/X11R6/lib for libXrender
    /opt/gnome/lib for libgtk-x11-2.0

  3.) Edit /opt/at3.0/etc/ld.so.conf and add the directories to all of the
      shared object files to the end of this file. Don't forget 'lib64' for
      the 64-bit equivalent libraries if applicable, e.g.

    /opt/gnome/lib/
    /opt/gnome/lib64/
    /usr/X11R6/lib
    /usr/X11R6/lib64/

  4.) Run the Advance Toolchain's ldconfig application to regenerate
      /opt/at3.0/etc/ld.so.cache, e.g.

    sudo /opt/at3.0/sbin/ldconfig

  The loader uses /opt/at3.0/etc/ld.so.cache to find the libraries the
  application was linked against.

  5.) Re-link using the Advance Toolchain's compiler:

    /opt/at3.0/bin/gcc -g -O2 -o <application_name> <list_of_dot_o_files> \
    <list_of_dot_a_files> -L<path_to_libraries> \
    -l<one_for_each_library_needed_for_the_link>

  e.g.

    /opt/at3.0/bin/gcc -g -O2 -o mandelbrot callbacks.o  interface.o \
    main.o quadmand.o  support.o mandel_internals.a \
    -L/usr/X11R6/lib -L/usr/X11R6/lib64 -L/opt/gnome/lib -lgtk-x11-2.0
    -lgdk-x11-2.0 -latk-1.0 -lgdk_pixbuf-2.0 \
    -lpangocairo-1.0 -lpango-1.0 -lcairo -lgobject-2.0 -lgmodule-2.0 -ldl \
    -lglib-2.0 -lfreetype -lfontconfig \
    -lXrender -lX11 -lXext -lpng12 -lz -lglitz -lm -lstdc++ -lpthread \
    -lgthread-2.0

  6.) If ld gives an error like the following then you're missing the path to
      that library in the link stage. Add it with -L<path to library>, e.g.

    /opt/at3.0/bin/ld: cannot find -lgtk-x11-2.0

  Add -L/opt/gnome/lib/ to the gnome compilation line. You need to tell the
  linker where to find all of the libraries.

  7.) When running the re-linked application if you get an error like the
      following:

    ./mandelbrot: error while loading shared libraries: libglib-2.0.so.0:
    cannot open shared object file: No such file or directory.

  You need to add the path to the library in question to
  /opt/at3.0/etc/ld.so.conf and rerun /opt/at00/sbin/ldconfig. The Advance
  Toolchain's loader needs to know where to find the libraries and uses the
  generated /opt/at3.0/etc/ld.so.cache to find them.

  8.) You can verify that the Advance Toolchain libraries were picked up by
      running the application prefaced with LD_DEBUG=libs, e.g.

    LD_DEBUG=all ./mandelbrot

  * WARNING: do NOT use LD_LIBRARY_PATH to point to the Advance Toolchain
  libraries if your applications aren't relinked with the Advance Toolchain.
  Doing so can result in ld.so and libc.so version mismatch and cause runtime
  failures.

Library search paths

  /opt/at3.0/etc/ld.so.conf already contains "include /etc/ld.so.conf" in the
  search order but may you may need to re-run /opt/at3.0/sbin/ldconfig in order
  populate /opt/at3.0/etc/ld.so.cache with the specialized search paths you've
  added to /etc/ld.so.conf.

Using the AT with Libhugetlbfs

  The Advance Toolchain will work with the default system libhugetlbfs linker
  wrapper and linker scripts.

  The /opt/at3.0/scripts/createldhuge.sh script is provided which copies
  /opt/at3.0/bin/ld to /opt/at00/bin/ld.orig and creates a wrapper script
  in /opt/at3.0/bin/ld.  You only need to run this if you want the Advance
  Toolchain to work with libhugetlbfs.

  The new /opt/at3.0/bin/ld is a wrapper script which detects whether the
  --hugetlbfs-link or --hugetlbsf-align switches have been passed to the linker.

  If so then it sets a script-local LD environment variable to /opt/at3.0at00/bin/ld.orig
  and invokes the system's ld.hugetlbfs, e.g.

    LD="/opt/at3.0/bin/ld.orig" /usr/share/libhugetlbfs/ld.hugetlbfs *switches*

  If it doesn't detect the hugetlbfs-link/hugetlbfs-align switch then it simply forwards
  the linker invocation to /opt/at3.0/bin/ld.orig directly.

  If hugetlbfs support is desired the first thing to do is backup the original
  Advance Toolchain linker just-in-case there are problems and you need to
  restore it manually.

    cp -p /opt/at3.0/bin/ld /opt/at00/bin/ld.backup

  The scripts in /opt/at3.0/scripts/  will do the rest of the work for you:

    createldhuge.sh
    restoreld.sh

  Invoke createldhuge.sh to create the wrapper ld:

    sudo sh createldhuge.sh /usr/share/libhugetlbfs/ld.hugetlbfs /opt/at3.0

  NOTE: This MUST be executed as sudo (or root) for the ld wrapper script to
  be created properly.

  When/If you want to restore the original Advance Toolchain linker simply
  run:

    sudo sh restoreld.sh.

  The Advance Toolchain GCC always ignores the -B/usr/share/libhugetlbfs
  directive because it has been built to always invoke /opt/at3.0/bin/ld
  directly.  You can use the GCC invocation you've always used, e.g.

    /opt/at3.0/bin/gcc temp.c -v -o temp -B/usr/share/libhugetlbfs/ -Wl,--hugetlbfs-link=BDT

  Note, if you invoke /opt/at3.0/bin/ld --hugetlbfs-link=BDT directly you'll
  need to supply a -m* flag which is normally provided by GCC directly (man ld
  for supported emulations).

Using AT with XLC 10.1 and XLF 12.1

  When compiling binaries using XLC 10.1 or XLF 12.1, the user must add the
  "-F path-to-cfg-file" option to the compiler command line. The Advance Toolchain
  provides config files for XLC 10.1 and XLF 12.1 in /opt/at3.0/scripts. For XLC, use
  "vac-AT3.0.cfg", and for XLF, "xlf-AT3.0.cfg".