Freitag, März 09, 2018

TableGen #5: DAGs

This is the fifth part of a series; see the first part for a table of contents.

With bit sequences, we have already seen one unusual feature of TableGen that is geared towards its specific purpose. DAG nodes are another; they look a bit like S-expressions:
def op1;
def op2;
def i32:

def Example {
  dag x = (op1 $foo, (op2 i32:$bar, "Hi"));
In the example, there are two DAG nodes, represented by a DagInit object in the code. The first node has as its operation the record op1. The operation of a DAG node must be a record, but there are no other restrictions. This node has two children or arguments: the first argument is named foo but has no value. The second argument has no name, but it does have another DAG node as its value.

This second DAG node has the operation op2 and two arguments. The first argument is named bar and has value i32, the second has no name and value "Hi".

DAG nodes can have any number of arguments, and they can be nested arbitrarily. The values of arguments can have any type, at least as far as the TableGen frontend is concerned. So DAGs are an extremely free-form way of representing data, and they are really only given meaning by TableGen backends.

There are three main uses of DAGs:
  1. Describing the operands on machine instructions.
  2. Describing patterns for instruction selection.
  3. Describing register files with something called "set theory".
I have not yet had the opportunity to explore the last point in detail, so I will only give an overview of the first two uses here.

Describing the operands of machine instructions is fairly straightforward at its core, but the details can become quite elaborate.

I will illustrate some of this with the example of the V_ADD_F32 instruction from the AMDGPU backend. V_ADD_F32 is a standard 32-bit floating point addition, at least in its 32-bit-encoded variant, which the backend represents as V_ADD_F32_e32.

Let's take a look at some of the fully resolved records produced by the TableGen frontend:
def V_ADD_F32_e32 {    // Instruction AMDGPUInst ...
  dag OutOperandList = (outs anonymous_503:$vdst);
  dag InOperandList = (ins VSrc_f32:$src0, VGPR_32:$src1);
  string AsmOperands = "$vdst, $src0, $src1";

def anonymous_503 {    // DAGOperand RegisterOperand VOPDstOperand
  RegisterClass RegClass = VGPR_32;
  string PrintMethod = "printVOPDst";
As you'd expect, there is one out operand. It is named vdst and an anonymous record is used to describe more detailed information such as its register class (a 32-bit general purpose vector register) and the name of a special method for printing the operand in textual assembly output. (The string "printVOPDst" will be used by the backend that generates the bulk of the instruction printer code, and refers to the method AMDGPUInstPrinter::printVOPDst that is implemented manually.)

There are two in operands. src1 is a 32-bit general purpose vector register and requires no special handling, but src0 supports more complex operands as described in the record VSrc_f32 elsewhere.

Also note the string AsmOperands, which is used as a template for the automatically generated instruction printer code. The operand names in that string refer to the names of the operands as defined in the DAG nodes.

This was a nice warmup, but didn't really demonstrate the full power and flexibility of DAG nodes. Let's look at V_ADD_F32_e64, the 64-bit encoded version, which has some additional features: the sign bits of the inputs can be reset or inverted, and the result (output) can be clamped and/or scaled by some fixed constants (0.5, 2, and 4). This will seem familiar to anybody who has worked with the old OpenGL assembly program extensions or with DirectX shader assembly.

The fully resolved records produced by the TableGen frontend are quite a bit more involved:
def V_ADD_F32_e64 {    // Instruction AMDGPUInst ...
  dag OutOperandList = (outs anonymous_503:$vdst);
  dag InOperandList =
    (ins FP32InputMods:$src0_modifiers, VCSrc_f32:$src0,
         FP32InputMods:$src1_modifiers, VCSrc_f32:$src1,
         clampmod:$clamp, omod:$omod);
  string AsmOperands = "$vdst, $src0_modifiers, $src1_modifiers$clamp$omod";
  list<dag> Pattern =
    [(set f32:$vdst, (fadd
      (f32 (VOP3Mods0 f32:$src0, i32:$src0_modifiers,
                      i1:$clamp, i32:$omod)),
      (f32 (VOP3Mods f32:$src1, i32:$src1_modifiers))))];

def FP32InputMods {     // DAGOperand Operand InputMods FPInputMods
  ValueType Type = i32;
string PrintMethod = "printOperandAndFPInputMods";
AsmOperandClass ParserMatchClass = FP32InputModsMatchClass;

def FP32InputModsMatchClass {   // AsmOperandClass FPInputModsMatchClass
  string Name = "RegOrImmWithFP32InputMods";
  string PredicateMethod = "isRegOrImmWithFP32InputMods";
  string ParserMethod = "parseRegOrImmWithFPInputMods";
The out operand hasn't changed, but there are now many more special in operands that describe whether those additional features of the instruction are used.

You can again see how records such as FP32InputMods refer to manually implemented methods. Also note that the AsmOperands string no longer refers to src0 or src1. Instead, the printOperandAndFPInputMods method on src0_modifiers and src1_modifiers will print the source operand together with its sign modifiers. Similarly, the special ParserMethod parseRegOrImmWithFPInputMods will be used by the assembly parser.

This kind of extensibility by combining generic automatically generated code with manually implemented methods is used throughout the TableGen backends for code generation.

Something else is new here: the Pattern. This pattern, together will all the other patterns defined elsewhere, is compiled into a giant domain-specific bytecode that executes during instruction selection to turn the SelectionDAG into machine instructions. Let's take this particular pattern apart:
(set f32:$vdst, (fadd ...))
We will match an fadd selection DAG node that outputs a 32-bit floating point value, and this output will be linked to the out operand vdst. (set, fadd and many others are defined in the target-independent include/llvm/Target/
(fadd (f32 (VOP3Mods0 f32:$src0, i32:$src0_modifiers,
                      i1:$clamp, i32:$omod)),
      (f32 (VOP3Mods f32:$src1, i32:$src1_modifiers)))
Both input operands of the fadd node must be 32-bit floating point values, and they will be handled by complex patterns. Here's one of them:
def VOP3Mods { // ComplexPattern
  string SelectFunc = "SelectVOP3Mods";
  int NumOperands = 2;
As you'd expect, there's a manually implemented SelectVOP3Mods method. Its signature is
bool SelectVOP3Mods(SDValue In, SDValue &Src,
                    SDValue &SrcMods) const;
It can reject the match by returning false, otherwise it pattern matches a single input SelectionDAG node into nodes that will be placed into src1 and src1_modifiers in the particular pattern we were studying.

Patterns can be arbitrarily complex, and they can be defined outside of instructions as well. For example, here's a pattern for generating the S_BFM_B32 instruction, which generates a bitfield mask:
def anonymous_2373anonymous_2371 {    // Pattern Pat ...
  dag PatternToMatch =
    (i32 (shl (i32 (add (i32 (shl 1, i32:$a)), -1)), i32:$b));
  list<dag> ResultInstrs = [(S_BFM_B32 ?:$a, ?:$b)];
The name of this record doesn't matter. The instruction selection TableGen backend simply looks for all records that have Pattern as a superclass. In this case, we match an expression of the form ((1 << a) - 1) << b on 32-bit integers into a single machine instruction.

So far, we've mostly looked at how DAGs are interpreted by some of the key backends of TableGen. As it turns out, most backends generate their DAGs in a fairly static way, but there are some fancier techniques that can be used as well. This post is already quite long though, so we'll look at those in the next post.

Dienstag, März 06, 2018

TableGen #4: Resolving variables

This is the fourth part of a series; see the first part for a table of contents.

It's time to look at some of the guts of TableGen itself. TableGen is split into a frontend, which parses the TableGen input, instantiates all the records, resolves variable references, and so on, and many different backends that generate code based on the instantiated records. In this series I'll be mainly focusing on the frontend, which lives in lib/TableGen/ inside the LLVM repository, e.g. here on the GitHub mirror. The backends for LLVM itself live in utils/TableGen/, together with the command line tool's main() function. Clang also has its own backends.

Let's revisit what kind of variable references there are and what kind of resolving needs to be done with an example:
class Foo<int src> {
  int Src = src;
  int Offset = 1;
  int Dst = !add(Src, Offset);

multiclass Foos<int src> {
  def a : Foo<src>;
  let Offset = 2 in
  def b : Foo<src>;

foreach i = 0-3 in
defm F#i : Foos<i>;
This is actually broken in older LLVM by one of the many bugs, but clearly it should work based on what kind of features are generally available, and with my patch series it certainly does work in the natural way. We see four kinds of variable references:
  • internally within a record, such as the initializer of Dst referencing Src and Offset
  • to a class template variable, such as Src being initialized by src
  • to a multiclass template variable, such as src being passed as a template argument for Foo
  • to a foreach iteration variable
As an aside, keep in mind that let in TableGen does not mean the same thing as in the many functional programming languages that have a similar construct. In those languages let introduces a new variable, but TableGen's let instead overrides the value of a variable that has already been defined elsewhere. In the example above, the let-statement causes the value of Offset to be changed in the record that was instantiated from the Foo class to create the b prototype inside multiclass Foos.

TableGen internally represents variable references as instances of the VarInit class, and the variables themselves are simply referenced by name. This causes some embarrassing issues around template arguments which are papered over by qualifying the variable name with the template name. If you pass the above example through a sufficiently fixed version of llvm-tblgen, one of the outputs will be the description of the Foo class:
class Foo<int Foo:src = ?> {
  int Src = Foo:src;
  int Offset = 1;
  int Dst = !add(Src, Offset);
  string NAME = ?;
As you can see, Foo:src is used to refer to the template argument. In fact, the template arguments of both classes and multiclasses are temporarily added as variables to their respective prototype records. When the class or prototype in a multiclass is instantiated, all references to the template argument variables are resolved fully, and the variables are removed (or rather, some of them are removed, and making that consistent is one of the many things I set out to clean up).

Similarly, references to foreach iteration variables are resolved when records are instantiated, although those variables aren't similarly qualified. If you want to learn more about how variable names are looked up, TGParser::ParseIDValue is a good place to start.

The order in which variables are resolved is important. In order to achieve the flexibility of overriding defaults with let-statements, internal references among record variables must be resolved after template arguments.

Actually resolving variable references used to be done by the implementations of the following virtual method of the Init class hierarchy (which represents initializers, i.e. values and expressions):
virtual Init *resolveReferences(Record &R, const RecordVal *RV) const;
This method recursively resolves references in the constituent parts of the expression and then performs constant folding, and returns the resulting value (or the original value if nothing could be resolved). Its interface is somewhat magical: R represents the "current" record which is used as a frame of reference for magical lookups in the implementation of !cast; this is a topic for another time, though. At the same time, variables referencing R are supposed to be resolved, but only if RV is null. If RV is non-null, then only references to that specific variable are supposed to be resolved. Additionally, some behaviors around unset depend on this.

This is replaced in my changes with
virtual Init *resolveReferences(Resolver &R) const;
where Resolver is an abstract base class / interface which can lookup values based on their variable names:
class Resolver {
  Record *CurRec;

  explicit Resolver(Record *CurRec) : CurRec(CurRec) {}
  virtual ~Resolver() {}

  Record *getCurrentRecord() const { return CurRec; }
  virtual Init *resolve(Init *VarName) = 0;
  virtual bool keepUnsetBits() const { return false; }
The "current record" is used as a reference for the aforementioned magical !casts, and keepUnsetBits instructs the implementation of bit sequences in BitsInit not to resolve to ? (as was explained in the third part of the series). resolve itself is implemented by one of the subclasses, most notably:
  1. MapResolver: Resolve based on a dictionary of name-value pairs.
  2. RecordResolver: Resolve variable names that appear in the current record.
  3. ShadowResolver: Delegate requests to an underlying resolver, but filter out some names.
 This last type of resolver is used by the implementations of !foreach and !foldl to avoid mistakes with nesting. Consider, for example:
class Exclamation<list<string> messages> {
  list Messages = !foreach(s, messages, s # "!");

class Greetings<list<string> names>
    : Exclamation&lt!foreach(s, names, "Hello, " # s)>;

def : Greetings<["Alice", "Bob"]>;
This effectively becomes a nested !foreach. The iteration variable is named s in both, so when substituting s for the outer !foreach, we must ensure that we don't also accidentally substitute s in the inner !foreach. We achieve this by having !foreach wrap the given resolver with a ShadowResolver. The same principle applies to !foldl as well, of course.

Freitag, Februar 23, 2018

TableGen #3: Bits

This is the third part of a series; see the first part for a table of contents.

One of the main backend uses of TableGen is describing target machine instructions, and that includes describing the binary encoding of instructions and their constituents parts. This requires a certain level of bit twiddling, and TableGen supports this with explicit bit (single bit) and bits (fixed-length sequence of bits) types:
class Enc<bits<7> op> {
  bits<10> Encoding;

  let Encoding{9-7} = 5;
  let Encoding{6-0} = op;

def InstA : Enc<0x35>;
def InstB : Enc<0x08>;
... will produce records:
def InstA {     // Enc
  bits<10> Encoding = { 1, 0, 1, 0, 1, 1, 0, 1, 0, 1 };
  string NAME = ?;
def InstB {     // Enc
  bits<10> Encoding = { 1, 0, 1, 0, 0, 0, 1, 0, 0, 0 };
  string NAME = ?;
So you can quite easily slice and dice bit sequences with curly braces, as long as the indices themselves are constants.

But the real killer feature is that so-called unset initializers, represented by a question mark, aren't fully resolved in bit sequences:
class Enc<bits<3> opcode> {
  bits<8> Encoding;
  bits<3> Operand;

  let Encoding{0} = opcode{2};
  let Encoding{3-1} = Operand;
  let Encoding{5-4} = opcode{1-0};
  let Encoding{7-6} = { 1, 0 };

def InstA : Enc<5>;
... produces a  record:
def InstA {     // Enc
  bits<8> Encoding = { 1, 0, 0, 1, Operand{2}, Operand{1}, Operand{0}, 1 };
  bits<3> Operand = { ?, ?, ? };
  string NAME = ?;
So instead of going ahead and saying, hey, Operand{2} is ?, let's resolve that and plug it into Encoding, TableGen instead keeps the fact that bit 3 of Encoding refers to Operand{2} as part of its data structures.

Together with some additional data, this allows a backend of TableGen to automatically generate code for instruction encoding and decoding (i.e., disassembling). For example, it will create the source for a giant C++ method with signature
uint64_t getBinaryCodeForInstr(const MCInst &MI, /* ... */) const;
which contains a giant constant array with all the fixed bits of each instruction followed by a giant switch statement with cases of the form:
case AMDGPU::S_CMP_EQ_I32:
case AMDGPU::S_CMP_EQ_U32:
case AMDGPU::S_CMP_EQ_U64:
// more cases...
  // op: src0
  op = getMachineOpValue(MI, MI.getOperand(0), Fixups, STI);
  Value |= op & UINT64_C(255);
  // op: src1
  op = getMachineOpValue(MI, MI.getOperand(1), Fixups, STI);
  Value |= (op & UINT64_C(255)) << 8;
The bitmasks and shift values are all derived from the structure of unset bits as in the example above, and some additional data (the operand DAGs) are used to identify the operand index corresponding to TableGen variables like Operand based on their name. For example, here are the relevant parts of the S_CMP_EQ_I32 record generated by the AMDGPU backend's TableGen files:
 def S_CMP_EQ_I32 {      // Instruction (+ other superclasses)
  field bits<32> Inst = { 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, src1{7}, src1{6}, src1{5}, src1{4}, src1{3}, src1{2}, src1{1}, src1{0}, src0{7}, src0{6}, src0{5}, src0{4}, src0{3}, src0{2}, src0{1}, src0{0} };
  dag OutOperandList = (outs);
  dag InOperandList = (ins SSrc_b32:$src0, SSrc_b32:$src1);
<8> src0 = { ?, ?, ?, ?, ?, ?, ?, ? };
<8> src1 = { ?, ?, ?, ?, ?, ?, ?, ? };
  // many more variables...
Note how Inst, which describes the 32-bit encoding as a whole, refers to the TableGen variables src0 and src1. The operand indices used in the calls to MI.getOperand() above are derived from the InOperandList, which contains nodes with the corresponding names. (SSrc_b32 is the name of a record that subclasses RegisterOperand and describes the acceptable operands, such as registers and inline constants.)

Hopefully this helped you appreciate just how convenient TableGen can be. Not resolving the ? in bit sequences is an odd little exception to an otherwise fairly regular language, but the resulting expressive power is clearly worth it. It's something to keep in mind when we discuss how variable references are resolved.

Mittwoch, Februar 21, 2018

TableGen #2: Functional Programming

This is the second part of a series; see the first part for a table of contents.

When the basic pattern of having classes with variables that are filled in via template arguments or let-statements reaches the limits of its expressiveness, it can become useful to calculate values on the fly. TableGen provides string concatenation out of the box with the paste operator ('#'), and there are built-in functions which can be easily recognized since they start with an exclamation mark, such as !add, !srl, !eq, and !listconcat. There is even an !if-builtin and a somewhat broken and limited !foreach.

There is no way of defining new functions, but there is a pattern that can be used to make up for it: classes with ret-values:
class extractBit<int val, int bitnum> {
  bit ret = !and(!srl(val, bitnum), 1);

class Foo<int val> {
  bit bitFour = extractBit<val, 4>.ret;

def Foo1 : Foo<5>;
def Foo2 : Foo<17>;
This doesn't actually work in LLVM trunk right now because of the deficiencies around anonymous record instantiations that I mentioned in the first part of the series, but after a lot of refactoring and cleanups, I got it to work reliably. It turns out to be an extremely useful tool.

In case you're wondering, this does not support recursion and it's probably better that way. It's possible that TableGen is already accidentally Turing complete, but giving it that power on purpose seems unnecessary and might lead to abuse.

Without recursion, a number of builtin functions are required. There has been a !foreach for a long time, and it is a very odd duck:
def Defs {
  int num;

class Example<list<int> nums> {
  list<int> doubled = !foreach(Defs.num, nums, !add(Defs.num, Defs.num));

def MyNums : Example<[4, 1, 9, -3]>;
In many ways it does what you'd expect, except that having to define a dummy record with a dummy variable in this way is clearly odd and fragile. Until very recently it did not actually support everything you'd think even then, and even with the recent fixes there are plenty of bugs. Clearly, this is how !foreach should look instead:
class Example<list<int> nums> {
  list<int> doubled =
      !foreach(x, nums, !add(x, x));

def MyNums : Example<[4, 1, 9, -3]>;
... and that's what I've implemented.

This ends up being a breaking change (the only one in the whole series, hopefully), but !foreach isn't actually used in upstream LLVM proper anyway, and external projects can easily adapt.

A new feature that I have found very helpful is a fold-left operation:
class Enumeration<list<string> items> {
  list<string> ret = !foldl([], items, lhs, item,
      !listconcat(lhs, [!size(lhs) # ": " # item]));

def MyList : Enumeration<["foo", "bar", "baz"]>;
This produces the following record:
def MyList {    // Enumeration
  list<string> ret = ["0: foo", "1: bar", "2: baz"];
  string NAME = ?;
Needless to say, it was necessary to refactor the TableGen tool very deeply to enable this kind of feature, but I am quite happy with how it ended up.

The title of this entry is "Functional Programming", and in a sense I lied. Functions are not first-class values in TableGen even with my changes, so one of the core features of functional programming is missing. But that's okay: most of what you'd expect to have and actually need is now available in a consistent manner, even if it's still clunkier than in a "real" programming language. And again: making functions first-class would immediately make TableGen Turing complete. Do we really want that?