description: Build the Masked Autoregressive Density Estimator (Germain et al., 2015).
View source on GitHub |
Build the Masked Autoregressive Density Estimator (Germain et al., 2015).
tfp.substrates.numpy.bijectors.masked_autoregressive_default_template(
hidden_layers, shift_only=False, activation=tf.nn.relu, log_scale_min_clip=-5.0,
log_scale_max_clip=3.0, log_scale_clip_gradient=False, name=None, *args,
**kwargs
)
This will be wrapped in a make_template to ensure the variables are only
created once. It takes the input and returns the loc
('mu' in [Germain et
al. (2015)][1]) and log_scale
('alpha' in [Germain et al. (2015)][1]) from
the MADE network.
Warning: This function uses masked_dense
to create randomly initialized
tf.Variables
. It is presumed that these will be fit, just as you would any
other neural architecture which uses tf.layers.dense
.
Each element of hidden_layers
should be greater than the input_depth
(i.e., input_depth = tf.shape(input)[-1]
where input
is the input to the
neural network). This is necessary to ensure the autoregressivity property.
This function also optionally clips the log_scale
(but possibly not its
gradient). This is useful because if log_scale
is too small/large it might
underflow/overflow making it impossible for the MaskedAutoregressiveFlow
bijector to implement a bijection. Additionally, the log_scale_clip_gradient
bool
indicates whether the gradient should also be clipped. The default does
not clip the gradient; this is useful because it still provides gradient
information (for fitting) yet solves the numerical stability problem. I.e.,
log_scale_clip_gradient = False
means
grad[exp(clip(x))] = grad[x] exp(clip(x))
rather than the usual
grad[clip(x)] exp(clip(x))
.
Args | |
---|---|
hidden_layers
|
Python list -like of non-negative integer, scalars
indicating the number of units in each hidden layer. Default: [512, 512].
</td>
</tr><tr>
<td>
shift_only
</td>
<td>
Python boolindicating if only the shiftterm shall be
computed. Default: False.
</td>
</tr><tr>
<td>
activation
</td>
<td>
Activation function (callable). Explicitly setting to None
implies a linear activation.
</td>
</tr><tr>
<td>
log_scale_min_clip
</td>
<td>
float-like scalar Tensor, or a Tensorwith the
same shape as log_scale. The minimum value to clip by. Default: -5.
</td>
</tr><tr>
<td>
log_scale_max_clip
</td>
<td>
float-like scalar Tensor, or a Tensorwith the
same shape as log_scale. The maximum value to clip by. Default: 3.
</td>
</tr><tr>
<td>
log_scale_clip_gradient
</td>
<td>
Python boolindicating that the gradient of
tf.clip_by_valueshould be preserved. Default: False.
</td>
</tr><tr>
<td>
name
</td>
<td>
A name for ops managed by this function. Default:
'masked_autoregressive_default_template'.
</td>
</tr><tr>
<td>
args
</td>
<td>
tf.layers.densearguments.
</td>
</tr><tr>
<td>
*kwargs
</td>
<td>
tf.layers.dense` keyword arguments.
|
Returns | |
---|---|
shift
|
Float -like Tensor of shift terms (the 'mu' in
[Germain et al. (2015)][1]).
|
log_scale
|
Float -like Tensor of log(scale) terms (the 'alpha' in
[Germain et al. (2015)][1]).
|
Raises | |
---|---|
NotImplementedError
|
if rightmost dimension of inputs is unknown prior to
graph execution.
|
[1]: Mathieu Germain, Karol Gregor, Iain Murray, and Hugo Larochelle. MADE: Masked Autoencoder for Distribution Estimation. In International Conference on Machine Learning, 2015. https://arxiv.org/abs/1502.03509