- Author: @eaplatanios
This document aims to serve as complementary to the original graph program extraction (GPE) whitepaper. It is a proposal for a general high-level design for performing GPE. Even though the document uses Swift and TensorFlow as the running use case, the proposed design is more general and could be used with other languages and frameworks. It is also a very early draft but the hope is that it will stimulate discussion and evolve into a few well-defined and well-specified design guidelines.
As described in the GPE whitepaper, GPE is effectively a program slicing method. This can be thought of as follows: working with Swift and TensorFlow means that we work with two different languages that get executed as two separate programs that can communicate when needed. One program is the main Swift program that also serves as the entrypoint of execution. The other is the implicitly constructed TensorFlow graph that gets executed by the TensorFlow runtime. Ideally we would like to:
- Have a single entrypoint for the Swift program and a single entrypoint for the TensorFlow program (i.e., use a single TF graph for the whole program, and a single session that starts executing parts of the graph at the same time that the Swift program starts executing things on the Swift-side).
- Allow for communication between the two programs, wherever necessary, and make that communication explicit to the user of the API, without it being cumbersome to them, or requiring boilerplate code.
- Avoid the need for aggressive inlining of all TensorFlow-related code on the Swift-side (which is what is currently happening in the Swift implementation).
- Allow for easy packaging of Swift for TF libraries that can be used efficiently by client code. By efficiently here we mean that the constructed TF graph will not contain unused code from the imported libraries.
- Try to make the between-communication parts of each program as large as possible, as this would allow for more optimizations on either side. We will refer to this as partitioning the two programs into as large chunks as possible, from now on.
This document mainly discusses points 1-4, but also touches upon 5, without going into much detail.
Having a single entrypoint for the Swift program is not worth discussing as that is already handled by the fact that, in our design, the Swift program is the one that controls execution anyway, and can invoke TF programs, as needed. In order to be able to have a single entrypoint for the TF program, we first discuss how we construct the graph, such that we have a single TF graph, and then we discuss how the session executing that graph is managed.
We construct the TF graph by walking through the Swift program and creating a program with the exact same structure, but keeping only the TF-relevant parts. This means that we start at the top-level Swift code, that will be executed when we execute the Swift program. For now, we ignore functions as that will be discussed when we tackle point 3.
For all top-level code we check where Tensor
s are used and how and we built a TF graph, based on the methods outlined in the GPE whitepaper. Tensor
s that are created on the Swift-side are handled by the TF graph in the way described in the section tackling point 2. After building the TF graph in this manner, we collect all sink nodes in the TF graph, whether those are nodes that send tensors back to the Swift program, or simply nodes that only have a side-effect on the TF side, and we create a new TF group
node that we call the sink
node. This means that once the sink
node finishes executing in TF, all TF-related computation for the current Swift for TF program will have finished executing and so it can be thought of as the TF endpoint.
We have ignored functions that use TF for now, but we will discuss those in the section devoted to point 3.
Given the single TF graph we have built for the current Swift for TF program, we can then go ahead and add the following in the beginning of the top-level Swift program code (on the Swift side):
- Create a new TF session (with any desirable configuration).
- Add a call to
session.run(sink)
.
This last call will start execution of the TF program, so that it is running in parallel with the Swift program. It allows all parts of the graph that do not depend on communicating with Swift at any point in time, to be executed in parallel with the main Swift program. The fact that we end up with a single TF session means that it becomes easy to manage the session configuration and to profile the TF program performance. Furthermore, no two TF sessions have to fight for the same resources. It also makes this program slicing approach cleaner in that there is only two programs running at any point in time, a single Swift program and its corresponding TF program.
Communication between Swift and TF can be handled in a clean manner, by following the same principles.
- Swift -> TF: The TF program deals with
Tensor
objects. Therefore, whenever aTensor
object is constructed on the Swift side, and later used by TF ops that require tensors, a correspondingSendToTF
op is added to the graph. This actually consists of two ops: (i) anEnqueue
op, and a (ii)Dequeue
op to a blocking queue. TheDequeue
op will be used as theTensor
object representation for TF ops consuming it later on, and asession.run(...)
call will be added to the Swift side, enqueueing the constructed tensor that will be sent to TF. - TF -> Swift:
Tensor
objects allow for various ways to obtain their content, or a reference to them. One proposal could be to have methods like.scalar
,.toArray
,.iterator
, and.handle
, that return Swift-typed values. Whenever these are used, data needs to be transferred from TF to Swift and so we would add aReceiveFromTF
op to the graph. This actually again consists of two ops: (i) anEnqueue
op, and a (ii)Dequeue
op to a blocking queue. TheEnqueue
op will be added to the TF graph, enqueueing the corresponding TF tensor (or its handle, for example) to the queue. At the same time, asession.run(...)
call will be added to the Swift side, dequeueing the TF content that is to be received from the graph. These method calls will be required to be made explicitly from the library user and will not be added automatically by the Swift compiler, in order to avoid confusion and potential inefficiencies.
Currently, Swift for TF handles all Swift functions that use TF by aggressively inlining them. This can cause lots of issues, including an exploding size of the compiled code, as well as limited supported for recursion (note that TF graph functions themselves support recursion and thus we should be able to support it on our side too). Here we suggest we use TF graph functions to handle Swift functions that use TF. We propose the following high-level design: all Swift functions that use TF will be compiled into two separate functions:
- Swift Function: The Swift code part of the compiled function that contains all computation that needs to happen on the Swift side. Note that in the case that this is a pure TF function (using only TF ops), then this function could be compiled to a no-op, or optimized away entirely.
- TF Graph Function: The TF part of the compiled function that is a TF
FunctionDef
protobuf message which contains the extracted TF graph for the compiled function.
The calling code will be invoking both functions in parallel, in its two generated programs -- the Swift program and the TF program, respectively. Note that for the TF program, the invocation will consist of simply adding the corresponding FunctionDef
to the graph function library and creating an op representing that function call.
Here we omit details related to handling mixed Swift and TF arguments and also related to propagation such function calls which means that one function invocation may require multiple FunctionDef
s to be added to the graph, as these can be handled relatively easily, and would distract from the main proposed idea.
Note that due to our way of tackling point 3, packaging compiled libraries becomes very easy. For Swift code that is not top-level (meaning that it won't be executed when running the compiled Swift code), we only package the FunctionDef
s for Swift functions that use TF, alongwise the SIL of their corresponding Swift functions. This might require us to also store an index that keeps track of arguments to those functions and correspondence between Swift and TF functions. There is no main TF graph being constructed in this case as, as already mentioned when tackling point 1, we only ever construct a single TF graph when compiling top-level Swift code.
This is really beneficial because it allows us to avoid aggressive inlining and thus shrink the constructed single TF graph, to only contain the FunctionDef
s, for the functions actually used in the top-level code that will be executed. It also allows all sorts of optimizations to be performed when compiling that top-level code, because the SIL representation for those functions is also stored, as is the FunctionDef
.
Many of the current optimizations that Swift for TF performs and aims to perform, can still be achieved due to the SIL and the FunctionDef
s being distributed with compiled libraries. Furthermore, a simpler approach may be to let the Swift compiler optimize the Swift-side code in isolation, and the same for the TF graph compiler (assuming that the TF graph compiler can also perform inlining of TF graph functions, if needed). This should already allow for lots of optimizations and would most likely already result in much more performant code than would have been obtained using TF eager mode, for example.
Note also that a lot of SendToTF
and ReceiveFromTF
ops can be avoided by being smart as to how the graph is wired and by allowing constant expressions on the Swift side to be implicitly converted to constant tensors in the TF graph, if needed.
- How should we handle send/receive ops? More specifically, we may want to pass around tensor handles and/or tensor content and I believe the answer to this question should be aware of this distinction.
- How do we package TF graph functions and host functions such that certain optimizations can still be performed on the user code (as in user of a Swift for TF library)?
- What kind of optimizations do we want to allow/consider?
I believe this document provides a good starting point for provided a standard high-level design for graph program extraction (GPE), that is simpler and more modular than the current approach in the Swift for TF implementation. It is a very early draft but the hope is that it will stimulate discussion and evolve into a few well-defined and well-specified design guidelines.