Polyhedral optimization framework
From LLVM
Contents |
[edit] Polly - The polyhedral optimization framework for LLVM
Polly uses a mathematical representation, the polyhedral model, to represent and transform loops and other control flow structures. Using an abstract representation it is possible to reason about transformations in a more general way and to use highly optimized linear programming libraries to figure out the optimal loop structure. These transformations can be used to do constant propagation through arrays, remove dead loop iterations, optimize loops for cache locality, optimize arrays, apply advanced automatic parallelization, drive vectorization, or they can be used to do software pipelining.
[edit] Status of the work
- August 2010 - RegionInfo pass committed to llvm
- August 2010 - llvm-test suite compiles
- July 2010 - Code generation works for normal SCoPs.
- June 2010 - OpenSCoP import/export works (as far as openscop is finished)
- May 2010 - The CLooG AST can be parsed
- April 2010 - SCoPs can automatically be detected (WIP)
- March 2010 - The RegionInfo framework is almost completed.
- February 2010 - Translating a simple loop to Polly-IR and passing it to CLooG-isl to regenerate a loop structure works.
- February 2010 - ISL and CLooG are integrated.
- January 2010 - The RegionInfo pass is finished.
- End of 2009 - Work on the infrastructure started.
[edit] Phone calls
We meet (more or less regularly) on the phone. Please ask for the number and when the next meeting is.
[edit] Architecture
[edit] ToDo
[edit] Get something working
The first iteration of this project aims to create a minimal working version of this framework, that is capable to transform an LLVM-IR program to the polyhedral model and back to LLVM-IR without applying any transformations.
| Frontend | |||
|---|---|---|---|
| Task | Bug tracker | Status | Owner |
| Region detection | RegionInfo | works + Committed upstream | Tobias + Ether |
| Access Functions | Working | John + Ether | |
| Alias sets | |||
| Scalar evolution to affine expression | [#5936] | Done | John + Ether |
| SCoP extraction | Working | Tobias + Ether | |
| SCoPs to polyhedral model | Working | Tobias + Ether | |
| Middle end | |||
| Task | Bug tracker | Status | Owner |
| Define polyhedral description | Working | Tobias | |
| Import/Export using openscop | working | Tobias | |
| Backend | |||
| Task | Bug tracker | Status | Owner |
| Create LLVM-IR using CLooG | Working | Andi + Tobias | |
| General tasks | |||
| Task | Bug tracker | Status | Owner |
| Setup git repositories | Done | Tobias | |
| Add CLooG/isl to build system | Works on Unix | Tobias | |
[edit] Further projects
There are several great projects related to polly that can already be started or are possible in the near future.
[edit] 1. Extend the post dominance analysis for infinite loops (small)
At the moment the post dominance analysis cannot handle infinite loops. All basic blocks in the CFG that do not return are - at the moment - not part of the post dominance tree. However by adding some virtual edges, they could be added to the post dominator tree. Where to add the edges needs some research.
This is a small project, that is is well defined. As it is directly in LLVM it can be easily committed upstream. It is useful for polly, as the RegionInfo pass will be able to detect regions in parts of the CFG that never return.
A good starter to get into LLVM
[edit] 2. Polly projects
[edit] 2.1 Export to scoplib (small)
SCoPlib is a library that is used to read/write an interchangeable polyhedral library format. It is/will be used in many projects like Graphite, Clan, Cloog, Pluto, ... For Polly it will be used to connect Pluto. Pluto can achieve some great performance improvements by just restructuring the execution schedule. Even more by creating OpenMP parallelism or vectorized code.
example/matmult in pluto:
% cd examples/matmul % clang -O3 matmul.c % time ./a.out ./a.out 84.94s user 0.12s system 99% cpu 1:25.32 total % clang -O3 matmul.tiled.c % time ./a.out ./a.out 5.48s user 0.05s system 99% cpu 5.550 total gcc -O3 matmul.par.c % time ./a.out ./a.out 6.08s user 0.05s system 99% cpu 6.142 total % gcc -O3 -fopenmp matmul.par.c % time ./a.out ./a.out 12.59s user 0.06s system 364% cpu 3.471 total clang without pluto optimizations: 1:25.32 min clang with pluto tiling: 0:05.55 min gcc with parallel code (without OpenMP): 0:06.14 min gcc with parallel code (with OpenMP on 2 cores): 0:03.47 min
Technical/engineering project. Easy to start without a strong polyhedron background. And nice to finish as it might show some impressive numbers
[edit] 2.2 Extending Polly
After iteration one is finished Polly can be extended to support conditions, infinite loops, irregular control flow, ...
[edit] 2.3. Vectorization
It is planned to use Polly to support vectorization in LLVM.
The basic idea is to use Polly and the polyhedral tools to transform code such that the innermost loops can be executed in parallel. Afterwords during code generation in LLVM the loops will be created using vector instructions.
To start we plan to use Pluto to transform the loop nests. Pluto can generate vector parallel code and annotate the vector parallel loops. Some impressive results were shown on code that was afterwords vectorized by the icc enforcing vectorization. We believe LLVM can do even better, as it can interact directly with the polyhedral information.
As an example simple matrix multiplication:
for(i=0; i<M; i++)
for(j=0; j<N; j++)
for(k=0; k<K; k++)
C[i][j] = beta*C[i][j] + alpha*A[i][k] * B[k][j];
After plutos transformations with added tiling and vectorization hints:
if ((K >= 1) && (M >= 1) && (N >= 1))
for (t1=0;t1<=floord(M-1,32);t1++)
for (t2=0;t2<=floord(N-1,32);t2++)
for (t3=0;t3<=floord(K-1,32);t3++)
for (t4=32*t1;t4<=min(M-1,32*t1+31);t4++)
for (t5=32*t3;t5<=min(K-1,32*t3+31);t5++) {
lbv=32*t2; ubv=min(N-1,32*t2+31);
#pragma ivdep
#pragma vector always
for (t6=lbv; t6<=ubv; t6++)
C[t4][t6]=beta*C[t4][t6]+alpha*A[t4][t5]*B[t5][t6];;
}
In this example the innermost loop is parallel without any dependencies.
[edit] 3. OpenMP in LLVM (hard)
Enable LLVM passes to create OpenMP parallelized code.
This is a very undefined project, that involves defining how to annotate OpenMP loops, work with clang to transform OpenMP to the right LLVM-IR annotations. And finally translate the OpenMP annotations to code and libomp calls. If you are really crazy you could write a BSD implementation of libOMP.
Interesting if you like to work on parallelization. Probably easy to get support upstream as OpenMP is widely known.
[edit] Try Polly
[edit] Get the code
At the moment polly needs still a patched version of LLVM. LLVM with patches enabling Polly support is available in the [Pofl repository].
Polly itself can be found in the [Polly repository]
git clone git://repo.or.cz/llvm-complete/pofl.git cd pofl git submodule init git submodule update export POFL=`pwd`
[edit] Install necessary libraries
[edit] CLooG
Please install the development version of CLooG as described at the bottom of [CLooG Homepage].
[edit] OpenSCoP
To enable openscop support you need the HEAD of the openscop library. It is available at: https://alchemy.futurs.inria.fr/svn/users/bastoul/scoplib/branches/openscop
Polly needs a version installed with --enable-mp-version in the configure. If you forgot this, please clean both install and build directory completely and try again. Otherwise polly may segfault unexpectedly.
Furthermore two patches need to be applied at the moment:
http://polly.tobias.grosser.es/
[edit] Build Polly
At the moment just the CMAKE build is supported.
mkdir build
cd build
cmake ${POFL}
cmake -DCMAKE_PREFIX_PATH=/path/to/cloog/ .
// If you are interested in scoplib import/export execute the next line
cmake -DCMAKE_PREFIX_PATH=/path/to/scoplib/ .
cmake -DLLVM_ENABLE_PEDANTIC=0 . #Required as the CLooG header files contain non C89 stuff.
make
If CMAKE cannot find CLooG and ISL add the install prefix of CLooG to CMAKE_PREFIX_PATH.
[edit] Use Polly
Currently polly is in a state that primarly focus on research. So the best way to use it is to create LLVM-IR files and use polly on those. Here a workflow that can be used to try polly.
# Create LLVM-IR files from C code. This will create a file matmul.s. clang -S -emit-llvm matmul.c # Apply optimizations to prepare code for polly opt -S -mem2reg -loopsimplify -indvars matmul.s > matmul.preopt.ll # Show the possible SCoPs (optional) opt -polly-print matmul.preopt.ll # Export openscop files (optional) opt -polly-export matmul.preopt.ll # Import the (changed) openscop files and print the (changed) SCoPs. (optional) opt -polly-import -polly-print matmul.preopt.ll # Codegenerate the SCoPs. This generates new code for the SCoPs detected by polly. # If -polly-import is present, transformations specified in the imported openscop # files will be applied. opt -polly-import -polly-codegen matmul.preopt.ll > matmul.pollyopt.ll # Create an executable and execute it. llc matmul.pollyopt.ll && gcc out.s && time ./a.out
If you want to find SCoPs in a whole program it is possible to combine several LLVM-IR files using llvm-ld and optimize/analyze this file. In this case optimizations like inlining and whole program mode may help to detect even more SCoPs.


