Tensor is a data structure for storing structured data, such as a scalar, a vector, a matrix or a high dimensional tensor. The name of the ADCME backend,
TensorFlow, is also derived from its core framework,
Tensor. Tensors can be viewed as symbolic versions of Julia's
A tensor is a collection of $n$-dimensional arrays. ADCME represents tensors using a
PyObject handle to the TensorFlow
Tensor data structure. A tensor has three important properties
name: Each Tensor admits a unique name.
shape: For scalars, the shape is always an empty tuple
(); for $n$-dimensional vectors, the shape is
(n,); for matrices or higher order tensors, the shape has the form
(n1, n2, ...)
dtype: The type of the tensors. There is a one-to-one correspondence between most TensorFlow types and Julia types (e.g.,
Bool). Therefore, we have overloaded the type name so users have a unified interface.
An important difference is that
tensor object stores data in the row-major while Julia's default for
Array is column major. The difference may affect performance if not carefully dealt with, but more often than not, the difference is not relevant if you do not convert data between Julia and Python often. Here is a representation of ADCME
There are 4 ways to create tensors.
constant. As the name suggests,
constantcreates an immutable tensor from Julia Arrays.
constant(1.0) constant(rand(10)) constant(rand(10,10))
Variable. In contrast to
Variablecreates tensors that are mutable. The mutability allows us to update the tensor values, e.g., in an optimization procedure. It is very important to understand the difference between
Variable: simply put, in inverse modeling, tensors that are defined as
Variableshould be the quantity you want to invert, while
constantis a way to provide known data.
Variable(1.0) Variable(rand(10)) Variable(rand(10,10))
placeholderis a convenient way to specify a tensor whose values are to be provided in the runtime. One use case is that you want to try out different values for this tensor and scrutinize the simulation result.
placeholder(Float64, shape=[10,10]) placeholder(rand(10)) # default value is `rand(10)`
SparseTensoris a special data structure to store a sparse matrix. Although it is not very emphasized in machine learning, sparse linear algebra is one of the cores to scientific computing. Thus possessing a strong sparse linear algebra support is the key to success inverse modeling with physics based machine learning.
using SparseArrays SparseTensor(sprand(10,10,0.3)) SparseTensor([1,2,3],[2,2,2],[0.1,0.3,0.5],3,3) # specify row, col, value, number of rows, number of columns
Now we know how to create tensors, the next step is to perform mathematical operations on those tensors.
Operator can be viewed as a function that takes multiple tensors and outputs multiple tensors. In the computational graph, operators are represented by nodes while tensors are represented by edges. Most mathematical operators, such as
/, and matrix operators, such as matrix-matrix multiplication, indexing and linear system solve, also work on tensors.
a = constant(rand(10,10)) b = constant(rand(10)) a + 1.0 # add 1 to every entry in `a` a * b # matrix vector production a * a # matrix matrix production a .* a # element wise production inv(a) # matrix inversion
With the aforementioned syntax to create and transform tensors, we have created a computational graph. However, at this point, all the operations are symbolic, i.e., the operators have not been executed yet.
To trigger the actual computing, the TensorFlow mechanism is to create a session, which drives the graph based optimization (like detecting dependencies) and executes all the operations.
a = constant(rand(10,10)) b = constant(rand(10)) c = a * b sess = Session() run(sess, c) # syntax for triggering the execution of the graph
If your computational graph contains
Variables, which can be listed via
get_collection, then you must initialize your graph before any
run command, in which the Variables are populated with initial values
The kernels provide the low level C++ implementation for the operators. ADCME augments users with missing features in TensorFlow that are crucial for scientific computing and tailors the syntax for numerical schemes. Those kernels, depending on their implementation, can be used in CPU, GPU, TPU or heterogenious computing environments.
All the intensive computations are done either in Julia or C++, and therefore we can achieve very high performance if the logic is done appropriately. For performance critical part, users may resort to custom kernels using
customop, which allows you to incooperate custom designed C++ codes.
ADCME performances operations on tensors. The actual computations are pushed back to low level C++ kernels via operators. A session is need to drive the executation of the computation. It will be easier for you to analyze computational cost and optimize your codes with this computation model in mind.
Here we show a list of commonly used operators in ADCME.
|Get size of dimension|
|3D Tensor indexing|
|Index relative to end|
|Extract row (most efficient)|
|Convert to dense diagonal matrix|
|Convert to sparse diagonal matrix|
|Extract diagonals as vector|
|Matrix (vector) multiplication|
|Average all elements|
|Average along dimension|
|Maximum/Minimum of all elements|
|Squeeze all single dimensions|
|Squeeze along dimension|
|Reduction (along dimension)|
In some cases you might find some features missing in ADCME but present in TensorFlow. You can always use
tf.<function_name>. It's compatible.