HowTo: Implement 8-bit memory access on a platform which only supports 32-bit memory operations
From LLVM
From the mailing list:
- Our processor only does 32bit reads and writes to memory. Changing a byte requires a 32bit read, modify, 32bit write sequence to change the 8bit value in memory.
- How should this be handled?
- Your backend will have i32 as a legal type and i8 as an illegal type. A store to an i8 will be automatically transformed into a "truncating"to i8 store of i32. Such a truncating store means: the lower 8 bits of this i32 value needs to be stored. When your backend sees such a truncating store of an i32 V it will need to turn that into: read the existing i32 value (W), replace the low 8 bits of W with the low 8 bits of V (high 8 bits on a big-endian machine), and write the new W back.
Not necessarily. You can still make the usual data types legal. You just need to do special handling for ISD::LOAD and ISD::STORE.
- Do any of the other current backends do this?
Yes, the CellSPU has the same read-modify-write semantics for stores. I'm not happy with the current way that I implemented ISD::LOAD and ISD::STORE in the backend. I'm planning on moving the ISD::LOAD and ISD::STORE code from the SPUTargetLowering class to the SSPUDAGToDAGISel class (to reduce the number of target-specific nodes.) At least you can see how things are done.
How Cell's SPU does things: - Read and write in 16-byte chunks (vectors-at-a-time) - To read an unaligned scalar means reading the 16-byte chunk, then rotating the chunk so that the desired value ends up in slot 0 of the vector - To write an unaligned scalar: read the chunk, generate a shuffle mask for inserting the scalar, shuffle the vector, then store the chunk.
It's probably very similar to what you're aiming to achieve.
[edit] Example implementation
The following example illustrates implementation of extended loads on a platform which only has 32-bit memory operations. All loads are zero-extended.
This example uses very simple CustomLowering of the ISD::LOAD, ISD::EXTLOAD and ISD::ZEXTLOAD nodes to extract the real desired value from the loaded 32-bit value.
The plan is to simply do a regular 32-bit load, and then mask out the desired bits using a SHL/SRL pair. This effectively gives us a zero extended load.
static SDValue Lower_LOAD(SDValue Op, SelectionDAG &DAG){ LoadSDNode *LN = cast<LoadSDNode>(Op); SDValue chain = LN->getChain(); MVT memoryVT = LN->getMemoryVT(); ISD::LoadExtType extension = LN->getExtensionType(); unsigned alignment = LN->getAlignment(); if(memoryVT == MVT::i32) { //natural load return DAG.getLoad(MVT::i32, chain, LN->getBasePtr(), LN->getSrcValue(), LN->getSrcValueOffset(), LN->isVolatile(), 4); } if(memoryVT == MVT::i8 || memoryVT == MVT::i16) { //bytewise and wordwise load SDValue dwordLoad = DAG.getLoad(MVT::i32, chain, LN->getBasePtr(), LN->getSrcValue(), LN->getSrcValueOffset(), LN->isVolatile(), 4); unsigned shift; if(memoryVT==MVT::i8) shift = 24; if(memoryVT==MVT::i16) shift = 16; SDValue shiftLeft = DAG.getNode(ISD::SHL, MVT::i32, dwordLoad, DAG.getConstant(shift, MVT::i32)); SDValue shiftRight = DAG.getNode(ISD::SRL, MVT::i32, shiftLeft, DAG.getConstant(shift, MVT::i32)); SDVTList load_vts = DAG.getVTList(MVT::i32, MVT::Other); SDValue load_results[2] = { shiftRight, dwordLoad.getValue(1) }; SDValue result = DAG.getNode(ISD::MERGE_VALUES, load_vts, load_results, sizeof(load_results)/sizeof(load_results[0])); return result; } assert(0 && "can't lower this type of load!"); return Op; }

