Skip to content

Commit e667401

Browse files
author
James Molloy
committed
[TableGen] Introduce a generic automaton (DFA) backend
Summary: This patch introduces -gen-automata, a backend for generating deterministic finite-state automata. DFAs are already generated by the -gen-dfa-packetizer backend. This backend is more generic and will hopefully be used to implement the DFA generation (and determinization) for the packetizer in the future. This backend allows not only generation of a DFA from an NFA (nondeterministic finite-state automaton), it also emits sidetables that allow a path through the DFA under a sequence of inputs to be analyzed, and the equivalent set of all possible NFA transitions extracted. This allows a user to not just answer "can my problem be solved?" but also "what is the solution?". Clearly this analysis is more expensive than just playing a DFA forwards so is opt-in. The DFAPacketizer has this behaviour already but this is a more compact and generic representation. Examples are bundled in unittests/TableGen/Automata.td. Some are trivial, but the BinPacking example is a stripped-down version of the original target problem I set out to solve, where we pack values (actually immediates) into bins (an immediate pool in a VLIW bundle) subject to a set of esoteric constraints. Reviewers: t.p.northover Subscribers: mgorny, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67968 llvm-svn: 373718
1 parent edfb869 commit e667401

File tree

10 files changed

+1181
-1
lines changed

10 files changed

+1181
-1
lines changed

llvm/include/llvm/Support/Automaton.h

+230
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,230 @@
1+
//===-- Automaton.h - Support for driving TableGen-produced DFAs ----------===//
2+
//
3+
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4+
// See https://llvm.org/LICENSE.txt for license information.
5+
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6+
//
7+
//===----------------------------------------------------------------------===//
8+
//
9+
// This file implements class that drive and introspect deterministic finite-
10+
// state automata (DFAs) as generated by TableGen's -gen-automata backend.
11+
//
12+
// For a description of how to define an automaton, see
13+
// include/llvm/TableGen/Automaton.td.
14+
//
15+
// One important detail is that these deterministic automata are created from
16+
// (potentially) nondeterministic definitions. Therefore a unique sequence of
17+
// input symbols will produce one path through the DFA but multiple paths
18+
// through the original NFA. An automaton by default only returns "accepted" or
19+
// "not accepted", but frequently we want to analyze what NFA path was taken.
20+
// Finding a path through the NFA states that results in a DFA state can help
21+
// answer *what* the solution to a problem was, not just that there exists a
22+
// solution.
23+
//
24+
//===----------------------------------------------------------------------===//
25+
26+
#ifndef LLVM_SUPPORT_AUTOMATON_H
27+
#define LLVM_SUPPORT_AUTOMATON_H
28+
29+
#include "llvm/ADT/ArrayRef.h"
30+
#include "llvm/ADT/DenseMap.h"
31+
#include "llvm/ADT/SmallVector.h"
32+
#include "llvm/Support/Allocator.h"
33+
#include <deque>
34+
#include <map>
35+
#include <memory>
36+
#include <unordered_map>
37+
#include <vector>
38+
39+
namespace llvm {
40+
41+
using NfaPath = SmallVector<uint64_t, 4>;
42+
43+
/// Forward define the pair type used by the automata transition info tables.
44+
///
45+
/// Experimental results with large tables have shown a significant (multiple
46+
/// orders of magnitude) parsing speedup by using a custom struct here with a
47+
/// trivial constructor rather than std::pair<uint64_t, uint64_t>.
48+
struct NfaStatePair {
49+
uint64_t FromDfaState, ToDfaState;
50+
51+
bool operator<(const NfaStatePair &Other) const {
52+
return std::make_tuple(FromDfaState, ToDfaState) <
53+
std::make_tuple(Other.FromDfaState, Other.ToDfaState);
54+
}
55+
};
56+
57+
namespace internal {
58+
/// The internal class that maintains all possible paths through an NFA based
59+
/// on a path through the DFA.
60+
class NfaTranscriber {
61+
private:
62+
/// Cached transition table. This is a table of NfaStatePairs that contains
63+
/// zero-terminated sequences pointed to by DFA transitions.
64+
ArrayRef<NfaStatePair> TransitionInfo;
65+
66+
/// A simple linked-list of traversed states that can have a shared tail. The
67+
/// traversed path is stored in reverse order with the latest state as the
68+
/// head.
69+
struct PathSegment {
70+
uint64_t State;
71+
PathSegment *Tail;
72+
};
73+
74+
/// We allocate segment objects frequently. Allocate them upfront and dispose
75+
/// at the end of a traversal rather than hammering the system allocator.
76+
SpecificBumpPtrAllocator<PathSegment> Allocator;
77+
78+
/// Heads of each tracked path. These are not ordered.
79+
std::deque<PathSegment *> Heads;
80+
81+
/// The returned paths. This is populated during getPaths.
82+
SmallVector<NfaPath, 4> Paths;
83+
84+
/// Create a new segment and return it.
85+
PathSegment *makePathSegment(uint64_t State, PathSegment *Tail) {
86+
PathSegment *P = Allocator.Allocate();
87+
*P = {State, Tail};
88+
return P;
89+
}
90+
91+
/// Pairs defines a sequence of possible NFA transitions for a single DFA
92+
/// transition.
93+
void transition(ArrayRef<NfaStatePair> Pairs) {
94+
// Iterate over all existing heads. We will mutate the Heads deque during
95+
// iteration.
96+
unsigned NumHeads = Heads.size();
97+
for (auto HeadI = Heads.begin(), HeadE = std::next(Heads.begin(), NumHeads);
98+
HeadI != HeadE; ++HeadI) {
99+
PathSegment *Head = *HeadI;
100+
// The sequence of pairs is sorted. Select the set of pairs that
101+
// transition from the current head state.
102+
auto PI = lower_bound(Pairs, NfaStatePair{Head->State, 0ULL});
103+
auto PE = upper_bound(Pairs, NfaStatePair{Head->State, INT64_MAX});
104+
// For every transition from the current head state, add a new path
105+
// segment.
106+
for (; PI != PE; ++PI)
107+
if (PI->FromDfaState == Head->State)
108+
Heads.push_back(makePathSegment(PI->ToDfaState, Head));
109+
}
110+
// Now we've iterated over all the initial heads and added new ones,
111+
// dispose of the original heads.
112+
Heads.erase(Heads.begin(), std::next(Heads.begin(), NumHeads));
113+
}
114+
115+
public:
116+
NfaTranscriber(ArrayRef<NfaStatePair> TransitionInfo)
117+
: TransitionInfo(TransitionInfo) {
118+
reset();
119+
}
120+
121+
void reset() {
122+
Paths.clear();
123+
Heads.clear();
124+
Allocator.DestroyAll();
125+
// The initial NFA state is 0.
126+
Heads.push_back(makePathSegment(0ULL, nullptr));
127+
}
128+
129+
void transition(unsigned TransitionInfoIdx) {
130+
unsigned EndIdx = TransitionInfoIdx;
131+
while (TransitionInfo[EndIdx].ToDfaState != 0)
132+
++EndIdx;
133+
ArrayRef<NfaStatePair> Pairs(&TransitionInfo[TransitionInfoIdx],
134+
EndIdx - TransitionInfoIdx);
135+
transition(Pairs);
136+
}
137+
138+
ArrayRef<NfaPath> getPaths() {
139+
Paths.clear();
140+
for (auto *Head : Heads) {
141+
NfaPath P;
142+
while (Head->State != 0) {
143+
P.push_back(Head->State);
144+
Head = Head->Tail;
145+
}
146+
std::reverse(P.begin(), P.end());
147+
Paths.push_back(std::move(P));
148+
}
149+
return Paths;
150+
}
151+
};
152+
} // namespace internal
153+
154+
/// A deterministic finite-state automaton. The automaton is defined in
155+
/// TableGen; this object drives an automaton defined by tblgen-emitted tables.
156+
///
157+
/// An automaton accepts a sequence of input tokens ("actions"). This class is
158+
/// templated on the type of these actions.
159+
template <typename ActionT> class Automaton {
160+
/// Map from {State, Action} to {NewState, TransitionInfoIdx}.
161+
/// TransitionInfoIdx is used by the DfaTranscriber to analyze the transition.
162+
/// FIXME: This uses a std::map because ActionT can be a pair type including
163+
/// an enum. In particular DenseMapInfo<ActionT> must be defined to use
164+
/// DenseMap here.
165+
std::map<std::pair<uint64_t, ActionT>, std::pair<uint64_t, unsigned>> M;
166+
/// An optional transcription object. This uses much more state than simply
167+
/// traversing the DFA for acceptance, so is heap allocated.
168+
std::unique_ptr<internal::NfaTranscriber> Transcriber;
169+
/// The initial DFA state is 1.
170+
uint64_t State = 1;
171+
172+
public:
173+
/// Create an automaton.
174+
/// \param Transitions The Transitions table as created by TableGen. Note that
175+
/// because the action type differs per automaton, the
176+
/// table type is templated as ArrayRef<InfoT>.
177+
/// \param TranscriptionTable The TransitionInfo table as created by TableGen.
178+
///
179+
/// Providing the TranscriptionTable argument as non-empty will enable the
180+
/// use of transcription, which analyzes the possible paths in the original
181+
/// NFA taken by the DFA. NOTE: This is substantially more work than simply
182+
/// driving the DFA, so unless you require the getPaths() method leave this
183+
/// empty.
184+
template <typename InfoT>
185+
Automaton(ArrayRef<InfoT> Transitions,
186+
ArrayRef<NfaStatePair> TranscriptionTable = {}) {
187+
if (!TranscriptionTable.empty())
188+
Transcriber =
189+
std::make_unique<internal::NfaTranscriber>(TranscriptionTable);
190+
for (const auto &I : Transitions)
191+
// Greedily read and cache the transition table.
192+
M.emplace(std::make_pair(I.FromDfaState, I.Action),
193+
std::make_pair(I.ToDfaState, I.InfoIdx));
194+
}
195+
196+
/// Reset the automaton to its initial state.
197+
void reset() {
198+
State = 1;
199+
if (Transcriber)
200+
Transcriber->reset();
201+
}
202+
203+
/// Transition the automaton based on input symbol A. Return true if the
204+
/// automaton transitioned to a valid state, false if the automaton
205+
/// transitioned to an invalid state.
206+
///
207+
/// If this function returns false, all methods are undefined until reset() is
208+
/// called.
209+
bool add(const ActionT &A) {
210+
auto I = M.find({State, A});
211+
if (I == M.end())
212+
return false;
213+
if (Transcriber)
214+
Transcriber->transition(I->second.second);
215+
State = I->second.first;
216+
return true;
217+
}
218+
219+
/// Obtain a set of possible paths through the input nondeterministic
220+
/// automaton that could be obtained from the sequence of input actions
221+
/// presented to this deterministic automaton.
222+
ArrayRef<NfaPath> getNfaPaths() {
223+
assert(Transcriber && "Can only obtain NFA paths if transcribing!");
224+
return Transcriber->getPaths();
225+
}
226+
};
227+
228+
} // namespace llvm
229+
230+
#endif // LLVM_SUPPORT_AUTOMATON_H
+95
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
//===- Automaton.td ----------------------------------------*- tablegen -*-===//
2+
//
3+
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4+
// See https://llvm.org/LICENSE.txt for license information.
5+
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6+
//
7+
//===----------------------------------------------------------------------===//
8+
//
9+
// This file defines the key top-level classes needed to produce a reasonably
10+
// generic finite-state automaton.
11+
//
12+
//===----------------------------------------------------------------------===//
13+
14+
// Define a record inheriting from GenericAutomaton to generate a reasonably
15+
// generic finite-state automaton over a set of actions and states.
16+
//
17+
// This automaton is defined by:
18+
// 1) a state space (explicit, always bits<32>).
19+
// 2) a set of input symbols (actions, explicit) and
20+
// 3) a transition function from state + action -> state.
21+
//
22+
// A theoretical automaton is defined by <Q, S, d, q0, F>:
23+
// Q: A set of possible states.
24+
// S: (sigma) The input alphabet.
25+
// d: (delta) The transition function f(q in Q, s in S) -> q' in Q.
26+
// F: The set of final (accepting) states.
27+
//
28+
// Because generating all possible states is tedious, we instead define the
29+
// transition function only and crawl all reachable states starting from the
30+
// initial state with all inputs under all transitions until termination.
31+
//
32+
// We define F = S, that is, all valid states are accepting.
33+
//
34+
// To ensure the generation of the automaton terminates, the state transitions
35+
// are defined as a lattice (meaning every transitioned-to state is more
36+
// specific than the transitioned-from state, for some definition of specificity).
37+
// Concretely a transition may set one or more bits in the state that were
38+
// previously zero to one. If any bit was not zero, the transition is invalid.
39+
//
40+
// Instead of defining all possible states (which would be cumbersome), the user
41+
// provides a set of possible Transitions from state A, consuming an input
42+
// symbol A to state B. The Transition object transforms state A to state B and
43+
// acts as a predicate. This means the state space can be discovered by crawling
44+
// all the possible transitions until none are valid.
45+
//
46+
// This automaton is considered to be nondeterministic, meaning that multiple
47+
// transitions can occur from any (state, action) pair. The generated automaton
48+
// is determinized, meaning that is executes in O(k) time where k is the input
49+
// sequence length.
50+
//
51+
// In addition to a generated automaton that determines if a sequence of inputs
52+
// is accepted or not, a table is emitted that allows determining a plausible
53+
// sequence of states traversed to accept that input.
54+
class GenericAutomaton {
55+
// Name of a class that inherits from Transition. All records inheriting from
56+
// this class will be considered when constructing the automaton.
57+
string TransitionClass;
58+
59+
// Names of fields within TransitionClass that define the action symbol. This
60+
// defines the action as an N-tuple.
61+
//
62+
// Each symbol field can be of class, int, string or code type.
63+
// If the type of a field is a class, the Record's name is used verbatim
64+
// in C++ and the class name is used as the C++ type name.
65+
// If the type of a field is a string, code or int, that is also used
66+
// verbatim in C++.
67+
//
68+
// To override the C++ type name for field F, define a field called TypeOf_F.
69+
// This should be a string that will be used verbatim in C++.
70+
//
71+
// As an example, to define a 2-tuple with an enum and a string, one might:
72+
// def MyTransition : Transition {
73+
// MyEnum S1;
74+
// int S2;
75+
// }
76+
// def MyAutomaton : GenericAutomaton }{
77+
// let TransitionClass = "Transition";
78+
// let SymbolFields = ["S1", "S2"];
79+
// let TypeOf_S1 = "MyEnumInCxxKind";
80+
// }
81+
list<string> SymbolFields;
82+
}
83+
84+
// All transitions inherit from Transition.
85+
class Transition {
86+
// A transition S' = T(S) is valid if, for every set bit in NewState, the
87+
// corresponding bit in S is clear. That is:
88+
// def T(S):
89+
// S' = S | NewState
90+
// return S' if S' != S else Failure
91+
//
92+
// The automaton generator uses this property to crawl the set of possible
93+
// transitions from a starting state of 0b0.
94+
bits<32> NewState;
95+
}

0 commit comments

Comments
 (0)