Normalizing Flows

In this article we introduce the ADCME module for flow-based generative models. The flow-based generative models can be used to model the joint distribution of high-dimensional random variables. It constructs a sequence of invertible transformation of distributions

\[x = f(u) \quad u \sim \pi(u)\]

based on the change of variable equation

\[p(x) = \pi(f^{-1}(x)) \left|\det\left(\frac{\partial f^{-1}(x)}{\partial x}\right)\right|\]

Example

Consider the transformation $f: \mathbb{R}\rightarrow [0,1]$, s.t. $f(x) = \mathrm{sigmoid}(x) = \frac{1}{1+e^{-x}}$. Consider the random variable

\[u \sim \mathcal{N}(0,1)\]

We want to find out the probability density function of $p(x)$, where $x=f(u)$. To this end, we have

\[f^{-1}(x)=\log(x) - \log(1-x) \Rightarrow \frac{\partial f^{-1}(x)}{\partial x} = \frac{1}{x} + \frac{1}{1-x}\]

Therefore, we have

\[\begin{aligned} p(x) &= \pi(f^{-1}(x))\left( \frac{1}{x} + \frac{1}{1-x}\right) \\ &= \frac{1}{\sqrt{2\pi}}\exp\left(-\frac{(\log(x)-\log(1-x))^2}{2}\right)\left( \frac{1}{x} + \frac{1}{1-x}\right) \end{aligned}\]

Compared to other generative models such as variational autoencoder (VAE) and generative neural networks (GAN), the flow-based generative models give us explicit formuas of density functions. For model training, we can directly minimizes the posterier log likelihood in the flow-based generative models, while use approximate likelihood functions in VAE and adversarial training in GAN. In general, the flow-based generative model is easier to train than VAE and GAN. In the following, we give some examples of using flow-based generatives models in ADCME.

Type Hierarchy

The flow-based generative model is organized as follows, from botton level to top level:

  • FlowOp. This consists of unit invertible transformations, such as AffineConstantFlow and Invertible1x1Conv.
  • NormalizingFlow. This is basically a sequence of FlowOp. It is not exposed to users.
  • NormalizingFlowModel. This is a container of the sequence of FlowOps and a prior distribution. NormalizingFlowModel is callable and can "normalize" the data distribution. We can also sample from NormalizingFlowModel, where the prior distribution is transformed to data distribution.

A Simple Example

Let's consider a simple example for transforming the two moons dataset to a univariate Gaussian distribution. First, we adapt a function from here and use it to generate the dataset

using Revise
using ADCME
using PyCall
using PyPlot
using Random

# `nmoons` is adapted from https://github.com/wildart/nmoons
function nmoons(::Type{T}, n::Int=100, c::Int=2;
    shuffle::Bool=false, ε::Real=0.1, d::Int = 2,
    translation::Vector{T}=zeros(T, d),
    rotations::Dict{Pair{Int,Int},T} = Dict{Pair{Int,Int},T}(),
    seed::Union{Int,Nothing}=nothing) where {T <: Real}
    rng = seed === nothing ? Random.GLOBAL_RNG : MersenneTwister(Int(seed))
    ssize = floor(Int, n/c)
    ssizes = fill(ssize, c)
    ssizes[end] += n - ssize*c
    @assert sum(ssizes) == n "Incorrect partitioning"
    pi = convert(T, π)
    R(θ) = [cos(θ) -sin(θ); sin(θ) cos(θ)]
    X = zeros(d,0)
    for (i, s) in enumerate(ssizes)
    circ_x = cos.(range(zero(T), pi, length=s)).-1.0
    circ_y = sin.(range(zero(T), pi, length=s))
    C = R(-(i-1)*(2*pi/c)) * hcat(circ_x, circ_y)'
    C = vcat(C, zeros(d-2, s))
    dir = zeros(d)-C[:,end] # translation direction
    X = hcat(X, C .+ dir.*translation)
    end
    y = vcat([fill(i,s) for (i,s) in enumerate(ssizes)]...)
    if shuffle
        idx = randperm(rng, n)
        X, y = X[:, idx], y[idx]
    end
    # Add noise to the dataset
    if ε > 0.0
        X += randn(rng, size(X)).*convert(T,ε/d)
    end
    # Rotate dataset
    for ((i,j),θ) in rotations
        X[[i,j],:] .= R(θ)*view(X,[i,j],:)
    end
    return X, y
end

function sample_moons(n)
    X, _ = nmoons(Float64, n, 2, ε=0.05, d=2, translation=[0.25, -0.25])
    return Array(X')
end

Next we construct a flow-based generative model, as follows:

prior = ADCME.MultivariateNormalDiag(loc=zeros(2))
model = NormalizingFlowModel(prior, flows)

We can print the model by just type model in REPL:

( MultivariateNormalDiag )
        ↓
AffineHalfFlow
        ↓
AffineHalfFlow
        ↓
AffineHalfFlow
        ↓
AffineHalfFlow
        ↓
AffineHalfFlow
        ↓
AffineHalfFlow
        ↓
AffineHalfFlow
        ↓
AffineHalfFlow
        ↓
AffineHalfFlow

Finally, we maximize the log llikelihood function using AdamOptimizer



x = placeholder(rand(128,2))
zs, prior_logpdf, logdet = model(x)
log_pdf = prior_logpdf + logdet
loss = -sum(log_pdf)

model_samples = rand(model, 128*8)
sess = Session(); init(sess)
opt = AdamOptimizer(1e-4).minimize(loss)
sess = Session(); init(sess)
for i = 1:10000
    _, l = run(sess, [opt, loss], x=>sample_moons(128))
    if mod(i,100)==0
        @info i, l 
    end
end

Models

ADCME has implemeted some popular flow-based generative models. For example, AffineConstantFlow, AffineHalfFlow, SlowMAF, MAF, IAF, ActNorm, Invertible1x1Conv, NormalizingFlow, NormalizingFlowModel, and NeuralCouplingFlow.

The following script shows how to usee those functions:

# Adapted from https://github.com/karpathy/pytorch-normalizing-flows
using Revise
using ADCME
using PyCall
using PyPlot
using Random

# `nmoons` is adapted from https://github.com/wildart/nmoons
function nmoons(::Type{T}, n::Int=100, c::Int=2;
    shuffle::Bool=false, ε::Real=0.1, d::Int = 2,
    translation::Vector{T}=zeros(T, d),
    rotations::Dict{Pair{Int,Int},T} = Dict{Pair{Int,Int},T}(),
    seed::Union{Int,Nothing}=nothing) where {T <: Real}
    rng = seed === nothing ? Random.GLOBAL_RNG : MersenneTwister(Int(seed))
    ssize = floor(Int, n/c)
    ssizes = fill(ssize, c)
    ssizes[end] += n - ssize*c
    @assert sum(ssizes) == n "Incorrect partitioning"
    pi = convert(T, π)
    R(θ) = [cos(θ) -sin(θ); sin(θ) cos(θ)]
    X = zeros(d,0)
    for (i, s) in enumerate(ssizes)
    circ_x = cos.(range(zero(T), pi, length=s)).-1.0
    circ_y = sin.(range(zero(T), pi, length=s))
    C = R(-(i-1)*(2*pi/c)) * hcat(circ_x, circ_y)'
    C = vcat(C, zeros(d-2, s))
    dir = zeros(d)-C[:,end] # translation direction
    X = hcat(X, C .+ dir.*translation)
    end
    y = vcat([fill(i,s) for (i,s) in enumerate(ssizes)]...)
    if shuffle
        idx = randperm(rng, n)
        X, y = X[:, idx], y[idx]
    end
    # Add noise to the dataset
    if ε > 0.0
        X += randn(rng, size(X)).*convert(T,ε/d)
    end
    # Rotate dataset
    for ((i,j),θ) in rotations
        X[[i,j],:] .= R(θ)*view(X,[i,j],:)
    end
    return X, y
end

function sample_moons(n)
    X, _ = nmoons(Float64, n, 2, ε=0.05, d=2, translation=[0.25, -0.25])
    return Array(X')
end


#------------------------------------------------------------------------------------------
# RealNVP
function mlp(x, k, id)
    x = constant(x)
    variable_scope("layer$k$id") do
        x = dense(x, 24, activation="leaky_relu")
        x = dense(x, 24, activation="leaky_relu")
        x = dense(x, 24, activation="leaky_relu")
        x = dense(x, 1)
    end
    return x
end
flows = [AffineHalfFlow(2, mod(i,2)==1, x->mlp(x, i, 0), x->mlp(x, i, 1)) for i = 0:8]


#------------------------------------------------------------------------------------------
# RealNVP
function mlp(x, k, id)
    x = constant(x)
    variable_scope("layer$k$id") do
        x = dense(x, 24, activation="leaky_relu")
        x = dense(x, 24, activation="leaky_relu")
        x = dense(x, 24, activation="leaky_relu")
        x = dense(x, 1)
    end
    return x
end
flows = [AffineHalfFlow(2, mod(i,2)==1, x->mlp(x, i, 0), x->mlp(x, i, 1)) for i = 0:8]


#------------------------------------------------------------------------------------------
# NICE
# function mlp(x, k, id)
#     x = constant(x)
#     variable_scope("layer$k$id") do
#         x = dense(x, 24, activation="leaky_relu")
#         x = dense(x, 24, activation="leaky_relu")
#         x = dense(x, 24, activation="leaky_relu")
#         x = dense(x, 1)
#     end
#     return x
# end
# flow1 = [AffineHalfFlow(2, mod(i,2)==1, missing, x->mlp(x, i, 1)) for i = 0:4]
# flow2 = [AffineConstantFlow(2, shift=false)]
# flows = [flow1;flow2]


# SlowMAF
#------------------------------------------------------------------------------------------
# function mlp(x, k, id)
#     x = constant(x)
#     variable_scope("layer$k$id") do
#         x = dense(x, 24, activation="leaky_relu")
#         x = dense(x, 24, activation="leaky_relu")
#         x = dense(x, 24, activation="leaky_relu")
#         x = dense(x, 2)
#     end
#     return x
# end
# flows = [SlowMAF(2, mod(i,2)==1, [x->mlp(x, i, 0)]) for i = 0:3]

# MAF
#------------------------------------------------------------------------------------------ 
# flows = [MAF(2, mod(i,2)==1, [24, 24, 24], name="layer$i") for i = 0:3]



# IAF 
#------------------------------------------------------------------------------------------ 
# flows = [IAF(2, mod(i,2)==1, [24, 24, 24], name="layer$i") for i = 0:3]
# prior = ADCME.MultivariateNormalDiag(loc=zeros(2))
# model = NormalizingFlowModel(prior, flows)

# Insert ActNorm to any of the flows 
#------------------------------------------------------------------------------------------ 
# flow2 = [ActNorm(2, "ActNorm$i") for i = 1:length(flows)]
# flows = permutedims(hcat(flow2, flows))[:]
# # error()
# # msample = rand(model,1)
# # zs, prior_logprob, log_det = model([0.0040 0.4426])
# # sess = Session(); init(sess)
# # run(sess, msample)
# # run(sess,zs)


# GLOW
#------------------------------------------------------------------------------------------ 
# function mlp(x, k, id)
#     x = constant(x)
#     variable_scope("layer$k$id") do
#         x = dense(x, 24, activation="leaky_relu")
#         x = dense(x, 24, activation="leaky_relu")
#         x = dense(x, 24, activation="leaky_relu")
#         x = dense(x, 1)
#     end
#     return x
# end
# flows = [Invertible1x1Conv(2, "conv$i") for i = 0:2]
# norms = [ActNorm(2, "ActNorm$i") for i = 0:2]
# couplings = [AffineHalfFlow(2, mod(i, 2)==1, x->mlp(x, i, 0), x->mlp(x, i, 1)) for i = 0:length(flows)-1]
# flows = permutedims(hcat(norms, flows, couplings))[:]

#------------------------------------------------------------------------------------------ 
# Neural Splines Coupling
# function mlp(x, k, id)
#     x = constant(x)
#     variable_scope("fc$k$id") do
#         x = dense(x, 16, activation="leaky_relu")
#         x = dense(x, 16, activation="leaky_relu")
#         x = dense(x, 16, activation="leaky_relu")
#         x = dense(x, 3K-1)
#     end
#     return x
# end
# K = 8
# flows = [NeuralCouplingFlow(2, x->mlp(x, i, 0), x->mlp(x, i, 1), K) for i = 0:2]
# convs = [Invertible1x1Conv(2, "conv$i") for i = 0:2]
# norms = [ActNorm(2, "ActNorm$i") for i = 0:2]
# flows = permutedims(hcat(norms, convs, flows))[:]

#------------------------------------------------------------------------------------------ 


prior = ADCME.MultivariateNormalDiag(loc=zeros(2))
model = NormalizingFlowModel(prior, flows)


x = placeholder(rand(128,2))
zs, prior_logpdf, logdet = model(x)
log_pdf = prior_logpdf + logdet
loss = -sum(log_pdf)

model_samples = rand(model, 128*8)
sess = Session(); init(sess)
opt = AdamOptimizer(1e-4).minimize(loss)
sess = Session(); init(sess)
for i = 1:10000
    _, l = run(sess, [opt, loss], x=>sample_moons(128))
    if mod(i,100)==0
        @info i, l 
    end
end

z = run(sess, model_samples[end]) 
x = sample_moons(128*8)
scatter(x[:,1], x[:,2], c="b", s=5, label="data")
scatter(z[:,1], z[:,2], c="r", s=5, label="prior --> posterior")
axis("scaled"); xlabel("x"); ylabel("y")#