HowTo: Implement 8-bit memory access on a platform which only supports 32-bit memory operations

From LLVM

Jump to: navigation, search

From the mailing list:

    • Our processor only does 32bit reads and writes to memory. Changing a byte requires a 32bit read, modify, 32bit write sequence to change the 8bit value in memory.
    • How should this be handled?


  • Your backend will have i32 as a legal type and i8 as an illegal type. A store to an i8 will be automatically transformed into a "truncating"to i8 store of i32. Such a truncating store means: the lower 8 bits of this i32 value needs to be stored. When your backend sees such a truncating store of an i32 V it will need to turn that into: read the existing i32 value (W), replace the low 8 bits of W with the low 8 bits of V (high 8 bits on a big-endian machine), and write the new W back.

Not necessarily. You can still make the usual data types legal. You just need to do special handling for ISD::LOAD and ISD::STORE.

    • Do any of the other current backends do this?

Yes, the CellSPU has the same read-modify-write semantics for stores. I'm not happy with the current way that I implemented ISD::LOAD and ISD::STORE in the backend. I'm planning on moving the ISD::LOAD and ISD::STORE code from the SPUTargetLowering class to the SSPUDAGToDAGISel class (to reduce the number of target-specific nodes.) At least you can see how things are done.

How Cell's SPU does things: - Read and write in 16-byte chunks (vectors-at-a-time) - To read an unaligned scalar means reading the 16-byte chunk, then rotating the chunk so that the desired value ends up in slot 0 of the vector - To write an unaligned scalar: read the chunk, generate a shuffle mask for inserting the scalar, shuffle the vector, then store the chunk.

It's probably very similar to what you're aiming to achieve.

[edit] Example implementation

The following example illustrates implementation of extended loads on a platform which only has 32-bit memory operations. All loads are zero-extended.

This example uses very simple CustomLowering of the ISD::LOAD, ISD::EXTLOAD and ISD::ZEXTLOAD nodes to extract the real desired value from the loaded 32-bit value.

The plan is to simply do a regular 32-bit load, and then mask out the desired bits using a SHL/SRL pair. This effectively gives us a zero extended load.

static SDValue Lower_LOAD(SDValue Op, SelectionDAG &DAG){
  LoadSDNode *LN               = cast<LoadSDNode>(Op);
  SDValue chain               = LN->getChain();
  MVT memoryVT                = LN->getMemoryVT();
  ISD::LoadExtType extension  = LN->getExtensionType();
  unsigned alignment           = LN->getAlignment();
 
  if(memoryVT == MVT::i32) { //natural load
    return DAG.getLoad(MVT::i32, chain, LN->getBasePtr(), LN->getSrcValue(), LN->getSrcValueOffset(), LN->isVolatile(), 4);
  }
 
  if(memoryVT == MVT::i8 || memoryVT == MVT::i16) { //bytewise and wordwise load
    SDValue dwordLoad = DAG.getLoad(MVT::i32, chain, LN->getBasePtr(), LN->getSrcValue(), LN->getSrcValueOffset(), LN->isVolatile(), 4);
 
    unsigned shift;
    if(memoryVT==MVT::i8)
      shift = 24;
    if(memoryVT==MVT::i16)
      shift = 16;
    SDValue shiftLeft = DAG.getNode(ISD::SHL, MVT::i32, dwordLoad, DAG.getConstant(shift, MVT::i32));
    SDValue shiftRight = DAG.getNode(ISD::SRL, MVT::i32, shiftLeft, DAG.getConstant(shift, MVT::i32));
 
    SDVTList load_vts = DAG.getVTList(MVT::i32, MVT::Other);
    SDValue  load_results[2] = { shiftRight, dwordLoad.getValue(1) };
    SDValue result = DAG.getNode(ISD::MERGE_VALUES, load_vts, load_results, sizeof(load_results)/sizeof(load_results[0]));
    return result;
  }
 
  assert(0 && "can't lower this type of load!");
  return Op;
}
Personal tools