Polyhedral optimization framework

From LLVM

Jump to: navigation, search

Contents

[edit] Polly - The polyhedral optimization framework for LLVM

Polly uses a mathematical representation, the polyhedral model, to represent and transform loops and other control flow structures. Using an abstract representation it is possible to reason about transformations in a more general way and to use highly optimized linear programming libraries to figure out the optimal loop structure. These transformations can be used to do constant propagation through arrays, remove dead loop iterations, optimize loops for cache locality, optimize arrays, apply advanced automatic parallelization, drive vectorization, or they can be used to do software pipelining.

[edit] Status of the work

  • August 2010 - RegionInfo pass committed to llvm
  • August 2010 - llvm-test suite compiles
  • July 2010 - Code generation works for normal SCoPs.
  • June 2010 - OpenSCoP import/export works (as far as openscop is finished)
  • May 2010 - The CLooG AST can be parsed
  • April 2010 - SCoPs can automatically be detected (WIP)
  • March 2010 - The RegionInfo framework is almost completed.
  • February 2010 - Translating a simple loop to Polly-IR and passing it to CLooG-isl to regenerate a loop structure works.
  • February 2010 - ISL and CLooG are integrated.
  • January 2010 - The RegionInfo pass is finished.
  • End of 2009 - Work on the infrastructure started.

[edit] Phone calls

We meet (more or less regularly) on the phone. Please ask for the number and when the next meeting is.

[edit] Architecture

File:LLVMPoly.png

[edit] ToDo

[edit] Get something working

The first iteration of this project aims to create a minimal working version of this framework, that is capable to transform an LLVM-IR program to the polyhedral model and back to LLVM-IR without applying any transformations.

Frontend
Task Bug tracker Status Owner
Region detection RegionInfo works + Committed upstream Tobias + Ether
Access Functions Working John + Ether
Alias sets
Scalar evolution to affine expression [#5936] Done John + Ether
SCoP extraction Working Tobias + Ether
SCoPs to polyhedral model Working Tobias + Ether
Middle end
Task Bug tracker Status Owner
Define polyhedral description Working Tobias
Import/Export using openscop working Tobias
Backend
Task Bug tracker Status Owner
Create LLVM-IR using CLooG Working Andi + Tobias
General tasks
Task Bug tracker Status Owner
Setup git repositories Done Tobias
Add CLooG/isl to build system Works on Unix Tobias

[edit] Further projects

There are several great projects related to polly that can already be started or are possible in the near future.

[edit] 1. Extend the post dominance analysis for infinite loops (small)

At the moment the post dominance analysis cannot handle infinite loops. All basic blocks in the CFG that do not return are - at the moment - not part of the post dominance tree. However by adding some virtual edges, they could be added to the post dominator tree. Where to add the edges needs some research.

This is a small project, that is is well defined. As it is directly in LLVM it can be easily committed upstream. It is useful for polly, as the RegionInfo pass will be able to detect regions in parts of the CFG that never return.

A good starter to get into LLVM

[edit] 2. Polly projects

[edit] 2.1 Export to scoplib (small)

SCoPlib is a library that is used to read/write an interchangeable polyhedral library format. It is/will be used in many projects like Graphite, Clan, Cloog, Pluto, ... For Polly it will be used to connect Pluto. Pluto can achieve some great performance improvements by just restructuring the execution schedule. Even more by creating OpenMP parallelism or vectorized code.

example/matmult in pluto:

% cd examples/matmul
% clang -O3 matmul.c 
% time ./a.out 
./a.out  84.94s user 0.12s system 99% cpu 1:25.32 total
% clang -O3 matmul.tiled.c
% time ./a.out            
./a.out  5.48s user 0.05s system 99% cpu 5.550 total
gcc -O3 matmul.par.c 
% time ./a.out            
./a.out  6.08s user 0.05s system 99% cpu 6.142 total
% gcc -O3 -fopenmp matmul.par.c 
% time ./a.out                 
./a.out  12.59s user 0.06s system 364% cpu 3.471 total

clang without pluto optimizations:                1:25.32 min
clang with pluto tiling:                          0:05.55 min
gcc with parallel code (without OpenMP):          0:06.14 min
gcc with parallel code (with OpenMP on 2 cores):  0:03.47 min

Technical/engineering project. Easy to start without a strong polyhedron background. And nice to finish as it might show some impressive numbers

[edit] 2.2 Extending Polly

After iteration one is finished Polly can be extended to support conditions, infinite loops, irregular control flow, ...

[edit] 2.3. Vectorization

It is planned to use Polly to support vectorization in LLVM.

The basic idea is to use Polly and the polyhedral tools to transform code such that the innermost loops can be executed in parallel. Afterwords during code generation in LLVM the loops will be created using vector instructions.

To start we plan to use Pluto to transform the loop nests. Pluto can generate vector parallel code and annotate the vector parallel loops. Some impressive results were shown on code that was afterwords vectorized by the icc enforcing vectorization. We believe LLVM can do even better, as it can interact directly with the polyhedral information.

As an example simple matrix multiplication:

  for(i=0; i<M; i++)
    for(j=0; j<N; j++)
      for(k=0; k<K; k++)
        C[i][j] = beta*C[i][j] + alpha*A[i][k] * B[k][j];

After plutos transformations with added tiling and vectorization hints:

if ((K >= 1) && (M >= 1) && (N >= 1))
  for (t1=0;t1<=floord(M-1,32);t1++)
    for (t2=0;t2<=floord(N-1,32);t2++)
      for (t3=0;t3<=floord(K-1,32);t3++)
        for (t4=32*t1;t4<=min(M-1,32*t1+31);t4++)
          for (t5=32*t3;t5<=min(K-1,32*t3+31);t5++) {
            lbv=32*t2; ubv=min(N-1,32*t2+31);
#pragma ivdep
#pragma vector always
            for (t6=lbv; t6<=ubv; t6++)
              C[t4][t6]=beta*C[t4][t6]+alpha*A[t4][t5]*B[t5][t6];;
            
          }

In this example the innermost loop is parallel without any dependencies.

[edit] 3. OpenMP in LLVM (hard)

Enable LLVM passes to create OpenMP parallelized code.

This is a very undefined project, that involves defining how to annotate OpenMP loops, work with clang to transform OpenMP to the right LLVM-IR annotations. And finally translate the OpenMP annotations to code and libomp calls. If you are really crazy you could write a BSD implementation of libOMP.

Interesting if you like to work on parallelization. Probably easy to get support upstream as OpenMP is widely known.

[edit] Try Polly

[edit] Get the code

At the moment polly needs still a patched version of LLVM. LLVM with patches enabling Polly support is available in the [Pofl repository].

Polly itself can be found in the [Polly repository]

git clone git://repo.or.cz/llvm-complete/pofl.git
cd pofl
git submodule init 
git submodule update
export POFL=`pwd`

[edit] Install necessary libraries

[edit] CLooG

Please install the development version of CLooG as described at the bottom of [CLooG Homepage].

[edit] OpenSCoP

To enable openscop support you need the HEAD of the openscop library. It is available at: https://alchemy.futurs.inria.fr/svn/users/bastoul/scoplib/branches/openscop

Polly needs a version installed with --enable-mp-version in the configure. If you forgot this, please clean both install and build directory completely and try again. Otherwise polly may segfault unexpectedly.

Furthermore two patches need to be applied at the moment:

http://polly.tobias.grosser.es/

[edit] Build Polly

At the moment just the CMAKE build is supported.

mkdir build
cd build
cmake ${POFL}
cmake -DCMAKE_PREFIX_PATH=/path/to/cloog/ .
// If you are interested in scoplib import/export execute the next line
cmake -DCMAKE_PREFIX_PATH=/path/to/scoplib/ .
cmake -DLLVM_ENABLE_PEDANTIC=0 . #Required as the CLooG header files contain non C89 stuff.
make

If CMAKE cannot find CLooG and ISL add the install prefix of CLooG to CMAKE_PREFIX_PATH.

[edit] Use Polly

Currently polly is in a state that primarly focus on research. So the best way to use it is to create LLVM-IR files and use polly on those. Here a workflow that can be used to try polly.

# Create LLVM-IR files from C code. This will create a file matmul.s.
clang -S -emit-llvm matmul.c

# Apply optimizations to prepare code for polly
opt -S -mem2reg -loopsimplify -indvars matmul.s > matmul.preopt.ll
 
# Show the possible SCoPs (optional)
opt -polly-print matmul.preopt.ll

# Export openscop files (optional)
opt -polly-export matmul.preopt.ll

# Import the (changed) openscop files and print the (changed) SCoPs. (optional)
opt -polly-import -polly-print matmul.preopt.ll 

# Codegenerate the SCoPs. This generates new code for the SCoPs detected by polly.
# If -polly-import is present, transformations specified in the imported openscop
# files will be applied.
opt -polly-import -polly-codegen matmul.preopt.ll > matmul.pollyopt.ll

# Create an executable and execute it.
llc matmul.pollyopt.ll && gcc out.s && time ./a.out

If you want to find SCoPs in a whole program it is possible to combine several LLVM-IR files using llvm-ld and optimize/analyze this file. In this case optimizations like inlining and whole program mode may help to detect even more SCoPs.

Personal tools