One of the main backend uses of TableGen is describing target machine instructions, and that includes describing the binary encoding of instructions and their constituents parts. This requires a certain level of bit twiddling, and TableGen supports this with explicit bit (single bit) and bits (fixed-length sequence of bits) types:
class Enc<bits<7> op> {... will produce records:
bits<10> Encoding;
let Encoding{9-7} = 5;
let Encoding{6-0} = op;
}
def InstA : Enc<0x35>;
def InstB : Enc<0x08>;
def InstA { // EncSo you can quite easily slice and dice bit sequences with curly braces, as long as the indices themselves are constants.
bits<10> Encoding = { 1, 0, 1, 0, 1, 1, 0, 1, 0, 1 };
string NAME = ?;
}
def InstB { // Enc
bits<10> Encoding = { 1, 0, 1, 0, 0, 0, 1, 0, 0, 0 };
string NAME = ?;
}
But the real killer feature is that so-called unset initializers, represented by a question mark, aren't fully resolved in bit sequences:
class Enc<bits<3> opcode> {... produces a record:
bits<8> Encoding;
bits<3> Operand;
let Encoding{0} = opcode{2};
let Encoding{3-1} = Operand;
let Encoding{5-4} = opcode{1-0};
let Encoding{7-6} = { 1, 0 };
}
def InstA : Enc<5>;
def InstA { // EncSo instead of going ahead and saying, hey, Operand{2} is ?, let's resolve that and plug it into Encoding, TableGen instead keeps the fact that bit 3 of Encoding refers to Operand{2} as part of its data structures.
bits<8> Encoding = { 1, 0, 0, 1, Operand{2}, Operand{1}, Operand{0}, 1 };
bits<3> Operand = { ?, ?, ? };
string NAME = ?;
}
Together with some additional data, this allows a backend of TableGen to automatically generate code for instruction encoding and decoding (i.e., disassembling). For example, it will create the source for a giant C++ method with signature
uint64_t getBinaryCodeForInstr(const MCInst &MI, /* ... */) const;which contains a giant constant array with all the fixed bits of each instruction followed by a giant switch statement with cases of the form:
case AMDGPU::S_CMP_EQ_I32:The bitmasks and shift values are all derived from the structure of unset bits as in the example above, and some additional data (the operand DAGs) are used to identify the operand index corresponding to TableGen variables like Operand based on their name. For example, here are the relevant parts of the S_CMP_EQ_I32 record generated by the AMDGPU backend's TableGen files:
case AMDGPU::S_CMP_EQ_U32:
case AMDGPU::S_CMP_EQ_U64:
// more cases...
case AMDGPU::S_SET_GPR_IDX_ON: {
// op: src0
op = getMachineOpValue(MI, MI.getOperand(0), Fixups, STI);
Value |= op & UINT64_C(255);
// op: src1
op = getMachineOpValue(MI, MI.getOperand(1), Fixups, STI);
Value |= (op & UINT64_C(255)) << 8;
break;
}
def S_CMP_EQ_I32 { // Instruction (+ other superclasses)Note how Inst, which describes the 32-bit encoding as a whole, refers to the TableGen variables src0 and src1. The operand indices used in the calls to MI.getOperand() above are derived from the InOperandList, which contains nodes with the corresponding names. (SSrc_b32 is the name of a record that subclasses RegisterOperand and describes the acceptable operands, such as registers and inline constants.)
field bits<32> Inst = { 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, src1{7}, src1{6}, src1{5}, src1{4}, src1{3}, src1{2}, src1{1}, src1{0}, src0{7}, src0{6}, src0{5}, src0{4}, src0{3}, src0{2}, src0{1}, src0{0} };
dag OutOperandList = (outs);
dag InOperandList = (ins SSrc_b32:$src0, SSrc_b32:$src1);
bits<8> src0 = { ?, ?, ?, ?, ?, ?, ?, ? };
bits<8> src1 = { ?, ?, ?, ?, ?, ?, ?, ? };
// many more variables...
}
Hopefully this helped you appreciate just how convenient TableGen can be. Not resolving the ? in bit sequences is an odd little exception to an otherwise fairly regular language, but the resulting expressive power is clearly worth it. It's something to keep in mind when we discuss how variable references are resolved.