Parallel Computing
Manually Place Operators on Devices
ADCME backend TensorFlow treats each operator as the smallest computation unit. Users are allowed to manually assign the device locations for each operator. This is usually done with the @pywith tf.device("/cpu:0")
syntax. For example, if we want to create a variable a
and compute $sin(a)$ on GPU:0
we can write
@pywith tf.device("/GPU:0") begin
global a = Variable(1.0)
global b = sin(a)
end
Custom Device Placement Functions
This syntax is useful and simple for placing operators on certain GPU devices without changing original codes. However, sometimes we want to place certain operators on certain devices. This can be done by implementing a custom assign_to_device
function. As an example, we want to place all Variables
on CPU:0 while placing all other operators on GPU:0, the function has the following form
PS_OPS = ["Variable", "VariableV2", "AutoReloadVariable"]
function assign_to_device(device, ps_device="/device:CPU:0")
function _assign(op)
node_def = pybuiltin("isinstance")(op, tf.NodeDef) ? op : op.node_def
if node_def.op in PS_OPS
return ps_device
else
return device
end
end
return _assign
end
Then we can write something like
@pywith tf.device(assign_to_device("/device:GPU:0")) begin
global a = Variable(1.0)
global b = sin(a)
end
We can check the location of a
and b
by inspecting their device
attributes
julia> a.device
"/device:CPU:0"
julia> b.device
"/device:GPU:0"
Colocate Gradient Operators
When we call gradients
, TensorFlow actually creates a set of new operators, one for each operator in the forward computation. By default, those operators are placed on the default device (GPU:0 if GPU is available; otherwise it's CPU:0). Sometimes we want to place the operators created by gradients on the same devices as the corresponding original operators. For example, if the operator b
(sin
) in the last example is on GPU:0, we hope the corresponding gradient computation (cos
) is also on GPU:0. This can be done by specifying colocate
keyword arguments in gradients
@pywith tf.device(assign_to_device("/device:GPU:0")) begin
global a = Variable(1.0)
global b = sin(a)
end
@pywith tf.device("/CPU:0") begin
global c = cos(b)
end
g = gradients(c, a, colocate=true)
In the following figure, we show the effects of colocate
of the above codes. The test code snippet is
g = gradients(c, a, colocate=true)
sess = Session(); init(sess)
run_profile(sess, g+c)
save_profile("true.json")
g = gradients(c, a, colocate=false)
sess = Session(); init(sess)
run_profile(sess, g+c)
save_profile("false.json")
Batch Normalization Update Operators
If you use bn
(batch normalization) on multi-GPUs, you must be careful to update the parameters in batch normalization on CPUs. This can be done by explicitly specify
@pywith tf.device("/cpu:0") begin
global update_ops = get_collection(tf.GraphKeys.UPDATE_OPS)
end
and bind update_ops
to an active operator (or explictly execute it in run(sess,...)
).