Normalizing Flows
In this article we introduce the ADCME module for flow-based generative models. The flow-based generative models can be used to model the joint distribution of high-dimensional random variables. It constructs a sequence of invertible transformation of distributions
\[x = f(u) \quad u \sim \pi(u)\]
based on the change of variable equation
\[p(x) = \pi(f^{-1}(x)) \left|\det\left(\frac{\partial f^{-1}(x)}{\partial x}\right)\right|\]
Consider the transformation $f: \mathbb{R}\rightarrow [0,1]$, s.t. $f(x) = \mathrm{sigmoid}(x) = \frac{1}{1+e^{-x}}$. Consider the random variable
\[u \sim \mathcal{N}(0,1)\]
We want to find out the probability density function of $p(x)$, where $x=f(u)$. To this end, we have
\[f^{-1}(x)=\log(x) - \log(1-x) \Rightarrow \frac{\partial f^{-1}(x)}{\partial x} = \frac{1}{x} + \frac{1}{1-x}\]
Therefore, we have
\[\begin{aligned} p(x) &= \pi(f^{-1}(x))\left( \frac{1}{x} + \frac{1}{1-x}\right) \\ &= \frac{1}{\sqrt{2\pi}}\exp\left(-\frac{(\log(x)-\log(1-x))^2}{2}\right)\left( \frac{1}{x} + \frac{1}{1-x}\right) \end{aligned}\]
Compared to other generative models such as variational autoencoder (VAE) and generative neural networks (GAN), the flow-based generative models give us explicit formuas of density functions. For model training, we can directly minimizes the posterier log likelihood in the flow-based generative models, while use approximate likelihood functions in VAE and adversarial training in GAN. In general, the flow-based generative model is easier to train than VAE and GAN. In the following, we give some examples of using flow-based generatives models in ADCME.
Type Hierarchy
The flow-based generative model is organized as follows, from botton level to top level:
FlowOp
. This consists of unit invertible transformations, such asAffineConstantFlow
andInvertible1x1Conv
.NormalizingFlow
. This is basically a sequence ofFlowOp
. It is not exposed to users.NormalizingFlowModel
. This is a container of the sequence ofFlowOp
s and a prior distribution.NormalizingFlowModel
is callable and can "normalize" the data distribution. We can also sample fromNormalizingFlowModel
, where the prior distribution is transformed to data distribution.
A Simple Example
Let's consider a simple example for transforming the two moons dataset to a univariate Gaussian distribution. First, we adapt a function from here and use it to generate the dataset
using Revise
using ADCME
using PyCall
using PyPlot
using Random
# `nmoons` is adapted from https://github.com/wildart/nmoons
function nmoons(::Type{T}, n::Int=100, c::Int=2;
shuffle::Bool=false, ε::Real=0.1, d::Int = 2,
translation::Vector{T}=zeros(T, d),
rotations::Dict{Pair{Int,Int},T} = Dict{Pair{Int,Int},T}(),
seed::Union{Int,Nothing}=nothing) where {T <: Real}
rng = seed === nothing ? Random.GLOBAL_RNG : MersenneTwister(Int(seed))
ssize = floor(Int, n/c)
ssizes = fill(ssize, c)
ssizes[end] += n - ssize*c
@assert sum(ssizes) == n "Incorrect partitioning"
pi = convert(T, π)
R(θ) = [cos(θ) -sin(θ); sin(θ) cos(θ)]
X = zeros(d,0)
for (i, s) in enumerate(ssizes)
circ_x = cos.(range(zero(T), pi, length=s)).-1.0
circ_y = sin.(range(zero(T), pi, length=s))
C = R(-(i-1)*(2*pi/c)) * hcat(circ_x, circ_y)'
C = vcat(C, zeros(d-2, s))
dir = zeros(d)-C[:,end] # translation direction
X = hcat(X, C .+ dir.*translation)
end
y = vcat([fill(i,s) for (i,s) in enumerate(ssizes)]...)
if shuffle
idx = randperm(rng, n)
X, y = X[:, idx], y[idx]
end
# Add noise to the dataset
if ε > 0.0
X += randn(rng, size(X)).*convert(T,ε/d)
end
# Rotate dataset
for ((i,j),θ) in rotations
X[[i,j],:] .= R(θ)*view(X,[i,j],:)
end
return X, y
end
function sample_moons(n)
X, _ = nmoons(Float64, n, 2, ε=0.05, d=2, translation=[0.25, -0.25])
return Array(X')
end
Next we construct a flow-based generative model, as follows:
prior = ADCME.MultivariateNormalDiag(loc=zeros(2))
model = NormalizingFlowModel(prior, flows)
We can print the model by just type model
in REPL:
( MultivariateNormalDiag )
↓
AffineHalfFlow
↓
AffineHalfFlow
↓
AffineHalfFlow
↓
AffineHalfFlow
↓
AffineHalfFlow
↓
AffineHalfFlow
↓
AffineHalfFlow
↓
AffineHalfFlow
↓
AffineHalfFlow
Finally, we maximize the log llikelihood function using AdamOptimizer
x = placeholder(rand(128,2))
zs, prior_logpdf, logdet = model(x)
log_pdf = prior_logpdf + logdet
loss = -sum(log_pdf)
model_samples = rand(model, 128*8)
sess = Session(); init(sess)
opt = AdamOptimizer(1e-4).minimize(loss)
sess = Session(); init(sess)
for i = 1:10000
_, l = run(sess, [opt, loss], x=>sample_moons(128))
if mod(i,100)==0
@info i, l
end
end
Models
ADCME has implemeted some popular flow-based generative models. For example, AffineConstantFlow
, AffineHalfFlow
, SlowMAF
, MAF
, IAF
, ActNorm
, Invertible1x1Conv
, NormalizingFlow
, NormalizingFlowModel
, and NeuralCouplingFlow
.
The following script shows how to usee those functions:
# Adapted from https://github.com/karpathy/pytorch-normalizing-flows
using Revise
using ADCME
using PyCall
using PyPlot
using Random
# `nmoons` is adapted from https://github.com/wildart/nmoons
function nmoons(::Type{T}, n::Int=100, c::Int=2;
shuffle::Bool=false, ε::Real=0.1, d::Int = 2,
translation::Vector{T}=zeros(T, d),
rotations::Dict{Pair{Int,Int},T} = Dict{Pair{Int,Int},T}(),
seed::Union{Int,Nothing}=nothing) where {T <: Real}
rng = seed === nothing ? Random.GLOBAL_RNG : MersenneTwister(Int(seed))
ssize = floor(Int, n/c)
ssizes = fill(ssize, c)
ssizes[end] += n - ssize*c
@assert sum(ssizes) == n "Incorrect partitioning"
pi = convert(T, π)
R(θ) = [cos(θ) -sin(θ); sin(θ) cos(θ)]
X = zeros(d,0)
for (i, s) in enumerate(ssizes)
circ_x = cos.(range(zero(T), pi, length=s)).-1.0
circ_y = sin.(range(zero(T), pi, length=s))
C = R(-(i-1)*(2*pi/c)) * hcat(circ_x, circ_y)'
C = vcat(C, zeros(d-2, s))
dir = zeros(d)-C[:,end] # translation direction
X = hcat(X, C .+ dir.*translation)
end
y = vcat([fill(i,s) for (i,s) in enumerate(ssizes)]...)
if shuffle
idx = randperm(rng, n)
X, y = X[:, idx], y[idx]
end
# Add noise to the dataset
if ε > 0.0
X += randn(rng, size(X)).*convert(T,ε/d)
end
# Rotate dataset
for ((i,j),θ) in rotations
X[[i,j],:] .= R(θ)*view(X,[i,j],:)
end
return X, y
end
function sample_moons(n)
X, _ = nmoons(Float64, n, 2, ε=0.05, d=2, translation=[0.25, -0.25])
return Array(X')
end
#------------------------------------------------------------------------------------------
# RealNVP
function mlp(x, k, id)
x = constant(x)
variable_scope("layer$k$id") do
x = dense(x, 24, activation="leaky_relu")
x = dense(x, 24, activation="leaky_relu")
x = dense(x, 24, activation="leaky_relu")
x = dense(x, 1)
end
return x
end
flows = [AffineHalfFlow(2, mod(i,2)==1, x->mlp(x, i, 0), x->mlp(x, i, 1)) for i = 0:8]
#------------------------------------------------------------------------------------------
# RealNVP
function mlp(x, k, id)
x = constant(x)
variable_scope("layer$k$id") do
x = dense(x, 24, activation="leaky_relu")
x = dense(x, 24, activation="leaky_relu")
x = dense(x, 24, activation="leaky_relu")
x = dense(x, 1)
end
return x
end
flows = [AffineHalfFlow(2, mod(i,2)==1, x->mlp(x, i, 0), x->mlp(x, i, 1)) for i = 0:8]
#------------------------------------------------------------------------------------------
# NICE
# function mlp(x, k, id)
# x = constant(x)
# variable_scope("layer$k$id") do
# x = dense(x, 24, activation="leaky_relu")
# x = dense(x, 24, activation="leaky_relu")
# x = dense(x, 24, activation="leaky_relu")
# x = dense(x, 1)
# end
# return x
# end
# flow1 = [AffineHalfFlow(2, mod(i,2)==1, missing, x->mlp(x, i, 1)) for i = 0:4]
# flow2 = [AffineConstantFlow(2, shift=false)]
# flows = [flow1;flow2]
# SlowMAF
#------------------------------------------------------------------------------------------
# function mlp(x, k, id)
# x = constant(x)
# variable_scope("layer$k$id") do
# x = dense(x, 24, activation="leaky_relu")
# x = dense(x, 24, activation="leaky_relu")
# x = dense(x, 24, activation="leaky_relu")
# x = dense(x, 2)
# end
# return x
# end
# flows = [SlowMAF(2, mod(i,2)==1, [x->mlp(x, i, 0)]) for i = 0:3]
# MAF
#------------------------------------------------------------------------------------------
# flows = [MAF(2, mod(i,2)==1, [24, 24, 24], name="layer$i") for i = 0:3]
# IAF
#------------------------------------------------------------------------------------------
# flows = [IAF(2, mod(i,2)==1, [24, 24, 24], name="layer$i") for i = 0:3]
# prior = ADCME.MultivariateNormalDiag(loc=zeros(2))
# model = NormalizingFlowModel(prior, flows)
# Insert ActNorm to any of the flows
#------------------------------------------------------------------------------------------
# flow2 = [ActNorm(2, "ActNorm$i") for i = 1:length(flows)]
# flows = permutedims(hcat(flow2, flows))[:]
# # error()
# # msample = rand(model,1)
# # zs, prior_logprob, log_det = model([0.0040 0.4426])
# # sess = Session(); init(sess)
# # run(sess, msample)
# # run(sess,zs)
# GLOW
#------------------------------------------------------------------------------------------
# function mlp(x, k, id)
# x = constant(x)
# variable_scope("layer$k$id") do
# x = dense(x, 24, activation="leaky_relu")
# x = dense(x, 24, activation="leaky_relu")
# x = dense(x, 24, activation="leaky_relu")
# x = dense(x, 1)
# end
# return x
# end
# flows = [Invertible1x1Conv(2, "conv$i") for i = 0:2]
# norms = [ActNorm(2, "ActNorm$i") for i = 0:2]
# couplings = [AffineHalfFlow(2, mod(i, 2)==1, x->mlp(x, i, 0), x->mlp(x, i, 1)) for i = 0:length(flows)-1]
# flows = permutedims(hcat(norms, flows, couplings))[:]
#------------------------------------------------------------------------------------------
# Neural Splines Coupling
# function mlp(x, k, id)
# x = constant(x)
# variable_scope("fc$k$id") do
# x = dense(x, 16, activation="leaky_relu")
# x = dense(x, 16, activation="leaky_relu")
# x = dense(x, 16, activation="leaky_relu")
# x = dense(x, 3K-1)
# end
# return x
# end
# K = 8
# flows = [NeuralCouplingFlow(2, x->mlp(x, i, 0), x->mlp(x, i, 1), K) for i = 0:2]
# convs = [Invertible1x1Conv(2, "conv$i") for i = 0:2]
# norms = [ActNorm(2, "ActNorm$i") for i = 0:2]
# flows = permutedims(hcat(norms, convs, flows))[:]
#------------------------------------------------------------------------------------------
prior = ADCME.MultivariateNormalDiag(loc=zeros(2))
model = NormalizingFlowModel(prior, flows)
x = placeholder(rand(128,2))
zs, prior_logpdf, logdet = model(x)
log_pdf = prior_logpdf + logdet
loss = -sum(log_pdf)
model_samples = rand(model, 128*8)
sess = Session(); init(sess)
opt = AdamOptimizer(1e-4).minimize(loss)
sess = Session(); init(sess)
for i = 1:10000
_, l = run(sess, [opt, loss], x=>sample_moons(128))
if mod(i,100)==0
@info i, l
end
end
z = run(sess, model_samples[end])
x = sample_moons(128*8)
scatter(x[:,1], x[:,2], c="b", s=5, label="data")
scatter(z[:,1], z[:,2], c="r", s=5, label="prior --> posterior")
axis("scaled"); xlabel("x"); ylabel("y")#