FDPR - Feedback-Directed Post-link Optimization for Linux on POWER
fdpr [--instrument file] [--train workload [--reset] ] [--optimize] [--log file] [-f, --profile-file file] [-o, --output-file file] [-V, --version] [-v, --verbose] [-h, --help] [fdprpro-options] [--] program
fdprpro -a action [fdprpro-options] program
FDPR is a performance-tuning utility for reducing the execution time and the real-memory utilization of user-level application programs. The tool optimizes the executable image of a program by collecting information on the program's behavior under a typical workload and creating a new version of the program optimized for that workload. The new program generated by the post-link optimizer typically runs faster and uses less real memory than the original program.
Note: The post-link optimizer applies advanced optimization techniques to programs. Some aggressive optimizations may result in programs that do not behave as expected. It is recommended to test the optimized program, at least, with the same test suite used to test the original program. The optimized program is not supported as input to the optimizer.
The post-link optimizer builds an optimized executable program in three distinct phases:
Creates an instrumented executable program and an empty template profile file.
Runs the instrumented program and updates the profile data.
Generates the optimized executable program file, given optimization options.
See the corresponding options for further details.
The three-phase process can be achieved by using fdpr or fdprpro.
fdpr provides a convenient user interface, enabling the three phases, or any legal combination thereof, to be performed in one command.
More experienced users may prefer to use fdprpro, which performs the actual processing. fdprpro provides explicit control over the actual processing and requires a separate activation to perform either the instrumentation or the optimization phases. This is specified by the action option -a|--action action, where the action term is "instr" to perform instrumentation or "opt" to perform optimization.
Note: The instrumented executable, created in the instrumentation phase and run in the training phase, typically runs several times slower than the original program. Due to the increased execution time required by the instrumented program, the executable should be invoked in such a way as to minimize execution duration, while still fully exercising the required code areas.
Creates an instrumented executable program with the specified name (default program.instr). Default: no instrumentation phase.
Normally, each time the instrumented program runs, it accumulates profile information in the profile file (see --profile-file). Specifying this option causes the initial option file, saved in profile-name.template, to be copied to the profile file. This effectively resets the profile information to its empty state. The option requires --train to be specified as well.
Runs the instrumented program and creates the profile data. The workload is a script that accepts one parameter: the executing program. fdpr invokes the script with the path to the instrumented program. If instrumentation phase is not specified, the instrumented program is assumed to be program.instr. Default: No profiling phase.
Generates the optimized executable program file. Users can specify optimizations explicitly by passing optimization options to fdprpro (see fdprpro options below). If no fdprpro optimization option is specified, the fdprpro -O option is used.
The optimized output file. The default is program.fdpr
The profile file. This is used as an output file in the instrumentation phase and as an input file in the optimization phase. The default is program.nprof
Prints version information and exits.
Prints progress indication and statistical information during processing.
Prints usage information and exit.
The above options can be shortened to any unique sequence.
To disambiguate option parsing, separate the options from program by '--'. For example, because the parameter to --instrument is optional, the following command is illegal:
$ fdpr --instr myprog
Instead, use the command:
$ fdpr --instr -- myprog
The input file to fdpr should be an ELF executable or shared
library (.so
file). Both ELF32 and ELF64 are supported.
Note: The executable program should be built with relocation information. fdpr supports both the GCC and XLC compilers and the GNU linker. To leave the relocation information in the executable file, use the linker with the --emit-relocs (or -q) option. This can be specified in the GCC command by -Wl,-q.
Along with the instrumented file, fdpr creates the profile file. The file is then filled with profile information (i.e., counts at various points in the program), while the instrumented program runs with its specified workload.
Note: The instrumented program requires a shared library
called libfdprinst32.so
(or libfdprinst64.so
for ELF64
programs). A proper installation from the RPM file ensures the libraries are found.
Alternatively, make sure the environment variable LD_LIBRARY_PATH
is set to the directory containing these libraries.
The instrumented program expects the profile file to be in the same directory as the instrumented program. To override this, set the environment variable FDPR_PROF_DIR to the required directory. Having the profile file specified with its full pathname, either by the -fprofile_file or via FDPR_PROF_DIR, is important if the program changes its directory during runtime or if it is executed from a different directory then the one where it was built.
By default, fdpr performs code reordering optimization together with the optimizations of branch prediction, branch folding, code alignment, and removal of redundant NOOP instructions (see the fdprpro option -O below) .
Additional optimizations are available explicitly by indicating specific fdprpro options (see below).
The following are typical usage examples of fdpr.
In this simple example, fdpr performs all three
phases. Here, myprog
is the input executable and
test
is a shell script that invokes myprog
.
$ fdpr --instr --train test --opt myprog
The test
script should look something like this:
# code to exercise myprog $1 arg1 arg2 ...
fdpr generates the instrumentation in myprog.instr
, runs the
script test
, performs the default optimizations,
and generates the output file in myprog.fdpr
.
Perform specific optimizations, producing the output in myprog.lro
$ fdpr -opt --link-register-optimization -RC -o myprog.lro myprog
This command performs only link-register optimization and code
reordering using the profile information in myprog.nprof
fdprpro accepts a host of optimization-specific options. In addition, there are several options that create auxiliary files for debugging purposes (e.g., code disassembly).
Analyze objects written in Assembly.
Provide a configuration file of analysis information (advanced option).
Analyze static data objects as distinct data elements for data reordering (unsafe for certain compilers).
Limit analysis phase to compiler generated code.
Apply special analysis for an input executable that was compiled with the -qfuncsect compiler option.
Input file format: can be LM (load module) or PO (program object).
Set the ignored function list. The file contains names of functions that considered as unsafe and thus are not modified.
Ignore .info sections produced with the -qfdpr option during compile time.
Perform embedded instrumentation. The profile will be collected into the application's global data area. When the application terminates, the collected data will be lost.
Set the file descriptor number to be used when opening the profile file. The default of Fdesc is set to the maximum-allowed number of open files.
instrument the values of parameters passed in function calles.
perform value profiling of RA and RB operands in mullX instructions.
Perform value profiling of RA and RB operands in load/store indexed instructions.
Ensure that additional stack space is properly allocated for the instrumented run. Use this option if your application uses the stack extensively (e.g., when the program uses alloca()). Note that this option adds extra overhead on instrumentation code.
Set the offset from the stack, a negative number, where the instrumentation's area for saving registers is kept at runtime. Use with care.
Set the shared memory segment address for profiling. Alternative shared memory addresses are needed when the instrumented program application creates a conflict with the shared-memory addresses preserved for the profiling. Typical alternative values are 0x40000000, 0x50000000, ... up to 0xC0000000. The default is set to 0x3000000.
Use shared memory key instead of file mapping to obtain a shared memory area for the profile data.
Instrument the input program file to collect profile information about indirect branches via registers. The default is set to collect the profile information.
Save the floating point registers in the instrumented code. The default is set to save floating point registers.
Specify a shared memory key to use when creating a shared memory area for the profile. The default key is created by hashing the profile file name (with ftok).
Set the name of a text format profile file containing profile information.
Accept the old profile file collected on previous versions of the input program file (requires the -f flag).
Set the profile file name. The profile file is created during the instrumentation phase and read during the optimization phase. The profile file is updated each time you run the instrumented program.
Set the run-time location of the profile file. The profile will be search during the profiling phase at this location. The default location is the path given in the profile file name (-f option). Applicable only at instrumentation phase.
Specify code alignment strategy. 1: Use grouping rules of target machine (default), 2: Same as 1 but consider also hotness of branch targets. See -m for the selected machine model.
Align basic blocks that are hotter than the average by a given (float) factor. This is a lower-level machine-specific alignment compared to --align-code. Value of -1 (the default) disables this option.
Eliminate branch to branch instructions.
Preserves original order for code which is less frequently executed than given threshold.
Build a Data Connectivity Graph (DCG) for enhanced data reordering (applicable only with the -RD flag).
Set branch prediction bit for conditional branches according to the collected profile.
Eliminate load instructions used when accessing branch tables.
Perform selective inlining of functions that produce long hot chains of code.
Convert BSS section into a data section. This is useful for more aggressive tocload and RD optimizations.
Perform conservative static data reordering by packing together all frequently referenced static variables.
Eliminate instructions related to unused local variables within frequently executed functions. This is useful mainly after applying function inlining optimization.
Insert data-cache prefetch instructions to improve data-cache performance.
Set data placement algorithm hotness threshold between (0,1), where 0 reorders the static variables in large groups based on the control flow, and 1 reorders the variables in very small groups based on their access frequency. (This is applicable only with the -RD flag).
Set data placement algorithm normalization factor between (0,1), where 0 causes static variables to be reordered regardless of their size, and 1 locates only small sized variables first. (applicable only with the -RD flag).
Reduce code size by grouping common instructions in function epilogs, into a single unified code.
Inflate constant areas in code section by adding num_of_bytes (entire set to 255) to each constant area.
Inflate data section by adding num_of_bytes (entire set to 255) to each data basic unit.
Inflate code secion by adding num_of_nop to each code basic block.
Edit existing binary code (advanced option).
Enable function cloning phase only during function inlining optimizations (applicable only with function inlining flags: -i, -si, -ihf, -isf, -shci).
Relocate instructions from frequently executed code to rarely executed code areas, when possible.
Set the aggressiveness of the -hr optimization option according to a factor value between (0,1), where 0 is the least aggressive factor (applicable only with the -hr option).
Relocate TOC store instructions from frequently executed code to rarely executed code areas, when possible.
Same as --selective-inline with --inline-small-funcs 12.
Inline all function call sites to functions that have a frequency count greater than the given pct frequency percentage.
Inline all functions that are smaller than or equal to the given size in bytes.
Eliminate stores and restores of registers that are killed (overwritten) after frequently executed function calls.
Eliminate load instructions of variable addresses by re-using pre-loaded addresses of adjacent variables.
Add NOP instructions to place each load instruction further apart following a store instruction that references the same memory address.
Optimizes inefficient memory access patterns in order to avoid load-after-store events. .
Optimizes inefficient memory access patterns in order to avoid load-after-store events. The optimization is possible if PM_MRK_LSU_REJECT_LHS profile is available.
Eliminate saves and restores of the link register in frequently-executed functions.
Unroll short loops containing one to several basic blocks according to an aggressiveness factor between (1,9), where 1 is the least aggressive unrolling option for very hot and short loops.
Set the number of unrolled iterations in each unrolled loop. The allowed range is between (2,50). Default is set to 2. (Applicable only with the -lu flag).
Unroll hot loops using given unrolling factor. The allowed values are integer numbers that are power of 2. Value -1 disables the optimization, value 1 calculates the unrolling factor automatically, given a machine model.
Remove NOP instructions from reordered code.
Switch on basic optimizations only. Same as -RC -nop -bp -bf.
Switch on less aggressive optimization flags. Same as -O -hr -pto -isf 8 -tlo -kr -see 0.
Switch on aggressive optimization flags. Same as -O2 -RD -isf 12 -si -lro -las -vro -btcar (for XCOFF files) -lu 9 -rt 0 -so -see 1 -oderat.
Switch on aggressive optimization flags together with aggressive function inlining. Same as -O3 -sidf 50 -ihf 20 -sdp 9 -shci 90 and -bldcg (for XCOFF files).
specialize function calls according to the values of their passed parameters.
Optimize mullX instructions by adding a run-time check on RA and RB and performing equivalent operations with lower penalty. The optimization requires the use of -imullX in the instrumentation phase.
Optimize load/store indexed instructions by adding a run-time check on RA and RB and performing equivalent operations with lower penalty. The optimization requires the use of -iderat in the instrumentation phase.
Perform selective inlining of dominant hot function calls based on the control flow paths leading to hot functions.
Preserve CSects' boundaries in reordered code.
Relocate the constant variables area to the top of the code section when possible.
Preserve original location of the entry point basic block in program.
Preserve functions' boundaries in reordered code.
Perform removal of R11 load instruction in _ptrgl csect.
Perform optimization of indirect call instructions via registers by replacing them with conditional direct jumps.
Set the frequency threshold for indirect calls that are to be optimized by -pto optimization. Allowed range between 0 and 1. Default is set to 0.8. (Applicable only with -pto flag).
Set the limit of the number of conditional statements generated by -pto optimization. Allowed values are between 1 and 100. Default value is set to 3. (Applicable only with the -pto flag).
Perform code reordering.
Set the aggressiveness of code reordering optimization. Allowed values are [0 | 1 | 2], where 0 preserves then original code order and 2 is the most aggressive. Default is set to 1. (Applicable only with the -RC flag).
Set the threshold fraction that determines when to enable condition reversal for each conditional branch during code reordering. Allowed input range is between 0.0 and 1.0 where 0.0 tries to preserve original condition direction and 1.0 ignores it. Default is set to 0.8 (Applicable only with the -RC flag).
Set the threshold fraction that determines when to terminate each chain of basic blocks during code reordering. Allowed input range is between 0.0 and 1.0 where 0.0 generates long chains and 1.0 creates single basic block chains. Default is set to 0.05. (Applicable only with the -RC flag).
Perform static data reordering.
Perform cross function path profiling.
Perform edges number limitation.
Remove multiple TOC entries pointing to the same location in the input program file.
Perform removal of TOC entries according to a removal factor between (0,1), where 0 removes non-accessed TOC entries only and 1 removes all possible TOC entries.
Remove traceback tables in reordered code.
Remove csect symbols.
Perform data prefetching within frequently executed loops based on stride analysis, according to an aggressiveness factor between (1,9), where 1 is the least aggressive.
Set the number of instructions for which data is prefetched into the cache ahead of time. Default value is platform dependant. (Applicable only with the -sdp flag).
Set the minimal stride size in bytes, for which data will be considered a candidate for prefetching. Default value is set to 128 bytes. (Applicable only with the -sdp flag).
Perform data prefetching based on the events file.
Set the number of instructions for which event based prefetch is performed. Default value is platform dependant. (Applicable only with the -ebp flag).
Use simplified prolog/epilog for functions that perform conditional early-exit. Use basic optimization with level=0 and maximal with level=1.
Perform selective inlining of functions in order to decrease the total number of execution counts, so that only functions with hotness above the given percentage are inlined.
Perform selective inlining of dominant hot function calls.
Set a dominant factor percentage for selective inline optimization. The allowed range is between 0 and 100. Default is set to 80. (Applicable only with the -si and -pbsi flags).
Set a hotness threshold factor percentage for selective inline optimization to inline all dominant function calls that have a frequency count greater than the given frequency percentage. Default is set to 100. (Applicable only with the -si -pbsi flags).
Perform branch prediction bit setting for conditional branches in spinlock code containing l*arx and st*cx instructions. (Applicable after -bp flag).
Perform data prefetching for memory access instructions preceding spinlock code containing l*arx and st*cx instructions.
Statically link hot code from specified dynamically linked libraries to the input program. The parameter consists of a comma-separated list of libraries and their profiles. IMPORTANT: Licensing rights of specified libraries should be observed when applying this copying optimization.
Set hotness threshold for the --static-link-libraries optimization. The allowed input range is between 0 (least aggressive) and 1, or -1, which does not require a profile and selects all code that might be called by the input program from the given libraries. Default is set at 0.5.
Reduce the stack frame size of functions that are called with a small number of arguments.
Shortcut PLT calls in shared libraries to local functions if they exist. Note: Resolving to external symbols is disabled for such calls.
Merge the stack frames of inlined functions with the frames of the calling functions.
Force the restructuring of traceback tables in reordered code. If -tb option is omitted, traceback tables are automatically included only for C++ applications that use the Try & Catch mechanism.
Replace each load instruction that references the TOC with a corresponding add-immediate instruction via the TOC anchor register, where possible.
Remove unreachable code and non-accessed static data.
Eliminate stores and restores of non-volatile registers in frequently executed functions by using available volatile registers.
Eliminate stores and restores of non-volatile registers in frequently executed functions by using available volatile registers, the extended version supports FP registers and transparency.
.
Complements partial profile information given for the basic blocks' frequencies by adding missing basic block-to-basic block edge counts.
Print the disassembled text section of the output program into output_file.dis_text file.
Dump profile information in ASCII format into program.aprof (requires the -f flag).
Print the disassembled bss section of the output program into output_file.dis_bss file.
Print the disassembled data section of the output program into output_file.dis_data file.
Dump the given profile information in ASCII format into program.aprof.init (requires the -f flag).
Dump instruction mix statistics based on gathered profile information.
Print a map of basic blocks and static variables with their respective new -> old addresses into a program.mapper file.
Set the name of the output file. The default instrumented file is program.instr. The default optimized file is program.fdpr.
Print the list of inlined functions along with their corresponding calling functions into a output_file.inl_list file (requires the -si or -i or -isf flags).
Preserve debug symbols.
Preserve linkage conventions.
Print a text format of the profiling counters into a program.counts file (requires the -f flag).
Strip the output file.
Optimize in parallel into multiple outputs as specified by option sets read from stdin.
Print the online help.
Output optimization journal information to jour_file.
Generate code for the specified machine model. Target machine can be one of the following models: power2, power3, ppc405, ppc440, power4, ppc970, power5, power6, power7, ppe, spe, spe_edp, z10, z9. Default is power7.
Set the output mode to quiet, suppressing informational messages.
Output statistics information to stat_file. If stat_file is '-', the output goes to the standard output. See --verbose for the default.
Set verbose output mode level. When set, various statistics about the output program are printed into the file program.stat. Allowed level range is between 0 and 3. Default is set to 0.
Print the version number.
Set the warning level so only errors of this level and below will be printed. The levels are: 1: errors, 2: warnings, 3: debug warning, 4: debug information. Default is 2.
As shown in the previous section, determining the default value of options is done using the statistics file. The options specified under 'options. ...' are the the user-specified option, plus the ones enabled by them. So, in the above example, specifying -O3 entailed among others, the setup of -hco option (Hot-Cold Optimization), and the setup of -hrf option (HCO Rescheduling Factor) with the value of 0.1.
By default the profile generated by fdprpro is in some internal binary format. To allow external tools to generate the profile, an ASCII profile is also supported (see --ascii-profile-file).
The format of the ASCII profile file is:
<Simple> address execCount </Simple> <Cond> address execCount fallthruCount </Cond> <Reg> address execCount fallthruCount regIndex type1 value1 execCount1 type2 value2 execCount2 ... typeN valueN execCountN </Reg>
The profile file is set of the Profile entries - Simple, Cond and Reg. The types in <Reg> entries are Abs - for Absolute Values, Text - for Text addresses, Data - for Data addresses. There are no other "tags" defined, there must not be white spaces between the tags` letters, no comments. Addresses and Values can be in decimal or in hex form (starting with 0x).
For example -
<Simple> 0x100000240 10 </Simple> <Simple> 0x100000250 20 </Simple> <Cond> 0x100000260 20 10 </Cond> <Simple> 0x100000270 20 </Simple> <Reg> 0x100000260 20 10 17 Abs 23 5 Text 0x100000300 5 Data 0x200000400 10 </Reg>
The order of the profile entries is not important, although for better readability they should be sorted according to address. The ASCII profile file (extension .aprof) should contains entries for code executed at least once. The code with execCount = 0 should not be included (it is not forbidden but will not provide any information to fdpr). Generally it is sufficient to provide one profile entry for each executed basic block. The address of that profile entry should be any address within the basic block. Since fdprpro's internal basic block partitioning is not always known, several profile entries may be provided for a single basic block up to the maximum of one profile entry for each instruction. When several profile entries are provided for a single basic block and they contain conflicting information (e.g., different execCount), fdprpro will produce a warning starting with "Conflicting profiling" ... and ignore the later conflicting information.
In addition to the optimized or instrumented program, fdprpro produces human readable output.
1. Standard output. The text that goes to standard output includes the sign-on message, progress information and sign-off message. The progress information displays the passage of fdprpro along the different phases of processing, as follows:
fdprpro (FDPR) <version> Linux/POWER fdprpro -a opt -O3 li.linux.gcc32.base > reading_exe ... > adjusting_exe ... > analyzing ... > building_program_infrastructure ... ... > updating_executable ... > writing_executable ... bye.
If the --quiet option is specified, no output is produced here.
2. Standard error. As usual, warnings and errors messages are written to the standard error file. Note that fdprpro exists after the first error.
3. Statistics file. If the --verbose <level> option is selected, various kinds of statistics about the program will be written to the statistics file, output_file.stat. The file consists of a list of tables, typically in a form of <attribute> <value> per line. The amount of information is determined by level. The following is an example, corresponding to the above invocation:
options. group active_options options. optimization -bf -bp -dp -hr -hrf 0.10 -kr -las -lro -lu 9 -isf 12 -nop -pr -RC -RD -rt 0.00 -si -tlo -vro options. output -o 1.base
global.use_try_and_catch: 0 global.profile_info: not_available
file.input: li.linux.gcc32.base file.output: 1.base file.statistics: 1.base.stat
analysis.csects: 347 analysis.functions: 343 analysis.constants: 13 analysis.basic_blocks: 5360 analysis.function_descriptors: 0 analysis.branch_tables: 10 analysis.branch_table_entries: 374 analysis.unknown_basic_units: 17 analysis.traceback_tables: 0 ...
Note, the options specified in the optimization group are the actual ones enabled by the -O3 option. See below.
Typically fdprpro optimizes a single target module (an executable file or a shared library), without considering the cross-module flow of the program. The --static-link-libaries option allows fdprpro to go beyond the boundary of the target module and import hot code (i.e., heavily used) from other modules to which it is dynamically linked. These modules are referred below as SLL libraries.
For example, to import hot code from mylib.so
using its profile mylib.so.prof
, to
myprog
, use the following command:
$ fdprpro -sll mylib.so:mylib.so.prof -O3 -o myprog.fdpr -f myprog.prof myprog
For better performance results, it is highly recommended that users collect the profiles of the specified SLL libraries with the same workload as the one used for training the target program.
IMPORTANT: If an SLL library is later upgraded, the optimization must be rerun with the upgraded library to keep the correspondence valid between that library and the target module.
IMPORTANT: It is the responsibility of the user to ensure that code copying from SLL libraries is compliant with the usage license of these libraries.
Starting with release 5.4.0.18 fdprpro provides special optimizations that look for operations with specific values and replace them with an optimized sequence. Such optimizations, which are typically target-specific, require corresponding instrumentation that will profile the code to identify potential sequences. The first optimization that use LVP is the -omullX optimization. The optimization performs strength-reduction on selected instances of integer multiplications. The user needs to specify -imullX for instrumentation and -omullX for optimization. To tune the optimization for Power6, specify also -m power6.
The data reordering algorithm of fdprpro is enabled by the -RD option and is available only for ELF64 (64-bit) programs. The algorithm reorders data elements in order to achieve better data cache efficiency as well as more effective instruction selection. It may operate on all data elements or only on subset of them depending on the selected aggressiveness. By default, a conservative algorithm is selected which does not reorder user's static data (i.e., data defined in .bss and .data sections). This is needed to protect against data access optimizations used in GCC4.3 and later. A more aggressive optimization is possible with the option --analyze-static-data (-asd) which considers all data elements.
fdprpro inserts certain code stubs during instrumentation which perform the
necessary counting. To keep program's state intact, the registers changed by
these stubs are save at the beginning of the stub and restored at the end. Writing below the stack
can cause segment violations in rare cases. This was found to occur in
applications that use the alloca()
function or that employ
multi-threading. To overcome this segment violation use the
--instrumentation-safe-stack-usage (-issu). The option adds code that prevents
the signal at the cost of increased code size (up to 20%). The user can also set the offset of
the save area from the stack pointer, which must be negative, using the
--instrumentation-stack-offset (-iso).
The alignment flag -A (--align-code) indicates the alignment strategy to use. The strategy codes are:
1 - An alignment strategy based on the instruction grouping of the selected target machine. See the -m (--machine) option for the possible machine models and the default value of this option.
2 - An alignment stragegy based on instruction grouping as in (1) above, while considering also the hotness of the branch targets. This typically makes prefetching the target instruction stream more efficient.
In exceptional conditions during profiling (training) the instrumentation code produces warnings and error messages. The instrumentation messages are written to a special file name profile file.errors_pid_tid to avoid having these messages interleaved with the regular text produced by the user's program. The directory where both the profile file and profile error file reside can be specified explicitly As with the profile file itself, the user may need to set the absolute path of directory where the profile error file resides can be specified with the environment variable FDPR_PROF_DIR (see Instrumentation and Profiling section above). If the directory where the program runs changes make sure FDPR_PROF_DIR is defined with the full path name.
The wrapper script for fdprpro (by default installed_dir is /opt/ibm/fdprpro).
The actual executable (binary) program.
The shared library used during profiling for ELF32 executable files.
The shared library used during profiling for ELF64 executable files.
The disassembly file of program text, produced by the --disassemble-text option.
The disassembly file of program data, produced by the --disassemble-data option.
The disassembly file of program data, produced by the --disassemble-bss option.
The map of basic block and static variables. See the --dump-mapper option.
The initial profile information in ASCII format. See the --dump-initial-ascii-profile option.
The ASCII-formatted profile file. See the --dump-ascii-profile option.
In case of error, the file contains information related to the error. Please send it with the bug report to fdpr@il.ibm.com.
If --verbose <level> is specified the file will contain certain statistics about the target program or about the optimization process.