Notes:Machine Code Emission in an ideal world

From LLVM

Jump to: navigation, search

Contents

[edit] Overview

NOTE: This is no longer considered "IDEAL". Please ask about the MC project if you're interested in this.


This is an 'ideal' design for the classes we'd like to see in the LLVM backend to make implementation of target specific Machine Code Emission simpler and cleaner.

Machine Code Generation, for the purposes of this design include:

  • Modules emitted as target and ABI specific Assembly (.s / .asm) for input to a native assembler.
  • Modules emitted as target and ABI specific object files (.o, .a / .obj) for input to a native linker.
  • Modules (or portions thereof) emitted as target and ABI specific machine code into a running process, during JIT.

There are a number of goals which need to be supported by this 'design'.

  • When implementing a new target backend, the developer should be required to implement as little as possible. There should be sane 'defaults' for everything.
  • There should be as much code-reuse and modularity as possible, while still maintaining clear class responsibilities.
  • Backend targets should be able to easily support more than one ABI.

By separating concerns and by emphasizing simularity a more normalized and flexable design can be devised.



[edit] Schedule

Direct Object Code Emission

[edit] Module Generators

The abstract base class is called ModuleGenerator. And there is one for each module. ModuleGenerators are factory classes for generating either ObjectModules or Assembly or listing output using AssemblyPrinters.

  • ELFModuleGenerator
  • COFFModuleGenerator
  • MachOModuleGenerator

[edit] Section Management

Most ABIs define specific 'sections' into which code and data need to be emitted. These sections are used when emitting:

  • Assembly files for input to the native target assembler.
  • Object files for input to the native target linker.
  • JIT compiled code, function stubs, and global data into an applications execution environment.

[edit] ABI's - Abstract Binary Interfaces & Targets

ABI's concerns can be looked after by plugin helper classes. They may generate EH information for exception handling for a specific target. The plugins are provided by a specific target or subtarget.

The DWARF like EH used in JIT should be coded in LLVM and/or as a lowering. It could then be used as standard EH for LLVM. This is not possible yet due to missing ULEB, and SLEB types and missing support for output section selection.

DWARF generation will be coded separately for Asm and JIT which both exist, and for ELF, which needs coding.

[edit] Assembly verses Object Code generation

This should be very easy to abstract at this level from looking at the AsmPrinter code verses [X86]Emitter::runOnMachineFunction(). Again this will be left till we have work done and a functioning binary system before refactoring assembler and binary output, if it happens at all.

[edit] Object Modules

There has been a design change to a single representation for LLVM compiler to have a single internal representation for Object Modules. These are then written to Object Module files by there respective obejct module writers.

[edit] Linker

ObjectModules are an abstract in memory representation of a specific modules structure and semantics. And manage sections of generated code. They are created by ObjectGenerator factories, and maybe written and read by ModuleWriters and ModuleReaders.

  • ELFObjectModule
  • COFFObjectModule
  • MachOObjectModule

ObjectModules and translated to specific object module types by their repective Generators.

[edit] Machine Code Emitters

MachineCodeEmitters actually emit or generate the machine code. These are used by higher level (ELF/COFF/MachO)ObjectModuleGenerator classes to actually generate binary machine code, in the (ELF/COFF/MachO)ObjectModule objects.

  • ARMCodeEmitter
  • AlphaCodeEmitter
  • CCodeEmitter
  • CellSPUCodeEmitter
  • CxxCodeEmitter
  • IA64CodeEmitter
  • MSILCodeEmitter
  • MipsCodeEmitter
  • PIC16CodeEmitter
  • PowerPCCodeEmitter
  • SparcCodeEmitter
  • X86CodeEmitter
  • XCoreCodeEmitter

[edit] MachineCodeEmitter

This is an imaginary base class for generating/emitting machine code. It is an imaginary class inherited generically by all CodeEmitters, ie it is templated allowing superclasses to specialize the actual binary code generation for specific purposes.

A MachineCodeEmitter outputs the binary machine code to a specific section within its ObjectModule object.

[edit] ObjectCodeEmitter

This class inherits from the MachineCodeEmitter class and is used for generating code to an ObjectModule's section.

[edit] JITCodeEmitter

This class inherits from the MachineCodeEmitter class and is used for emitting JIT code.

The JITCodeEmitter's responsibilities are :-

  • To output code and data to an extensible area of memory.
  • To maintain local fixups for non PIC code for relocation when moved if extending the buffer. This includes MMBLocation's, and LabelLocation's.

[edit] Binary Object

BinaryObject facilitates the parcelling up of binary code or data with labels, relocations, and GlobalValue's. The ObjectCodeEmitter writes binary data to BinaryObject.

An ObjectModule will have a number of BinaryObject's which the writer will output as sections.

It is called BinaryObject to distinguish itself from the different uses of the terms segmant and section, in different assemblers and object module formats.

[edit] Object Module Writers

There should be a single base class derived by all classes which generate target-specific object files for input to the native target linker. Lets call this class ObjectModuleWriter.

ObjectModuleWriter responsibilites:

  • ObjectModuleWriter's write ObjectModules out to files.
  • They generate the object module files header, sections, symbol tables, etc, from the ObjectModule object.
  • endianness

Prospective ObjectModuleWriters:

  • ELFModuleWriter
  • COFFModuleWriter
  • MachOModuleWriter

[edit] Assembly Module Printers

There should be a single base class derived by all classes which generate target-specific assembly files for input to the native target assembler. Lets call this class AssemblyModulePrinter.

AssemblyModulePrinter responsibilites:

  • ...

Prospective AssemblyModulePrinters:

  • IntelAssemblyModulePrinter
  • AT&TAssemblyModulePrinter

It would be good if the Assembly Module Printers could provide a listing too.

[edit] Module Readers

Module readers is a future feature with the idea of extending LLVM's capabilities to that of being a linker as well.

  • ELFModuleReader
  • COFFModuleReader
  • MachOModuleReader

[edit] Implementation

[edit] ObjectModuleGenerator's

class ObjectModuleGenerator : public ModulePass
class ELFModuleGenerator : public ObjectModuleGenerator
class COFFModuleGenerator : public ObjectModuleGenerator
class MachOModuleGenerator : public ObjectModuleGenerator

[edit] ObjectModule's

class ObjectModule
class ELFObjectModule : public ObjectModule
class COFFObjectModule : public ObjectModule
class MachOObjectModule : public ObjectModule

[edit] ObjectModuleWriter's

class ObjectModuleWriter : public ModulePass
class ELFModuleWriter : public ObjectModuleWriter
class COFFModuleWriter : public ObjectModuleWriter
class MachOModuleWriter : public ObjectModuleWriter

[edit] MachineCodeEmitter's

class ObjectModuleEmitter;
class JITCodeEmitter;
 
class X86CodeEmitter< class machineCodeEmitter> : public MachineFunctionPass
{
    ...
    machineCodeEmitter MCE;
    ...
};
 
FunctionPass *llvm::createX86CodeEmitterPass< class machineCodeEmitter>(
    X86TargetMachine &TM, machineCodeEmitter &CE)
{
    return new X86CodeEmitter< machineCodeEmitter>( TM, CE);
}
 
FunctionPass *llvm::createX86JITCodeEmitterPass(
    X86TargetMachine &TM, JITCodeEmitter &JCE)
{
    return new X86CodeEmitter< JITCodeEmitter>( TM, JCE);
}
 
FunctionPass *llvm::createX86ObjectCodeEmitterPass(
    X86TargetMachine &TM, ObjectCodeEmitter &OCE)
{
    return new X86CodeEmitter< ObjectCodeEmitter>( TM, OCE);
}

MachineCodeEmitter here is an imaginary class as we do not have C++0x's concepts which would allow us to model this relationship in a typesafe way.

[edit] LLVM Code bookmarks

[edit] Existing code structure

llc.cpp:280-314

include/Target/TargetMachine.h

lib/CodeGen/LLVMTargetMachine.cpp

addParsesToEmitFile

addParsesToEmitFileFinish

lib/Target/X86/X86TargetMachine.h

lib/Target/X86/X86TargetMachine.cpp

addCodeEmitter()

lib/Target/X86/X86CodeEmitter.cpp

createX86CodeEmitterPass()

Emitter::runOnMachineFunction()

lib/CodeGen/ELFWritter.h

class ELFWriter : public MachineFunctionPass

class ELFCodeEmitter : public MachineCodeEmitter

class AsmPrinter : public MachineFunctionPass

[edit] Existing code and files

include/llvm/Support/ELF.h

include/llvm/Support/Dwarf.h

include/llvm/CodeGen/ELFRelocation.h

include/llvm/CodeGen/DwarfWriter.h

[edit] Existing ELFWriter code

lib/CodeGen/ELFWriter.h

lib/CodeGen/ELFWriter.cpp

[edit] Existing MachOWriter code

lib/CodeGen/MachOWriter.h

lib/CodeGen/MachOWriter.cpp

[edit] Lessons from AsmPrinter and friends

AsmPrinter.h AsmPrinter.cpp

bool AsmPrinter::EmitSpecialLLVMGlobal(const GlobalVariable *GV)

void AsmPrinter::EmitConstantPool(MachineConstantPool *MCP)

void AsmPrinter::EmitJumpTableInfo(MachineJumpTableInfo *MJTI, MachineFunction &MF)


X86IntelAsmPrinter.h

X86IntelAsmPrinter.cpp

[edit] AsmInfo classes

include/llvm/Target/TargetAsmInfo.h


include/llvm/Target/DarwinTargetAsmInfo.h

include/llvm/Target/ELFTargetAsmInfo.h

include/llvm/Target/TargetELFWriterInfo.h

include/llvm/Target/TargetMachOWriterInfo.h

[edit] Dwarf Writer (Printer)

lib/CodeGen/AsmPrinter/DwarfWriter.cpp

[edit] Related documents

[[1] ELF and DWARF standards]

[[2] PE and COFF]

[[3] ECOFF]

Personal tools