Using discrete random variables as likelihood¶
Here we introduce how to specify the generative model when a discrete likelihood is used.
Imports¶
In [1]:
Copied!
import os
os.environ["TF_ENABLE_ONEDNN_OPTS"] = "0"
from typing import Any
import numpy as np
import tensorflow as tf
import tensorflow_probability as tfp
import elicito as el
tfd = tfp.distributions
import os
os.environ["TF_ENABLE_ONEDNN_OPTS"] = "0"
from typing import Any
import numpy as np
import tensorflow as tf
import tensorflow_probability as tfp
import elicito as el
tfd = tfp.distributions
2025-11-12 14:14:17.366822: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used. 2025-11-12 14:14:17.412961: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-11-12 14:14:19.069331: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2025-11-12 14:14:20.716784: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
Gradients for discrete RVs¶
Challenge¶
- we can't compute gradients for discrete random variables (RVs)
- possible solution: gumbel-softmax trick see Jang et al. (2016)
Jang, E., Gu, S., & Poole, B. (2016). Categorical reparameterization with gumbel-softmax. https://doi.org/10.48550/arXiv.1611.01144.
Example: Binomial model¶
$$ \begin{align*} \beta_0 &\sim \text{Normal}(\mu_0, \sigma_0) &\text{priors}\\ \beta_1 &\sim \text{Normal}(\mu_1, \sigma_1) &\\ \mu &= \text{sigmoid}(\beta_0 + \beta_1X) &\text{link+linear predictor} \\ y_i &\sim \text{Binomial}(\mu, N) &\text{likelihood} \end{align*} $$
- using the
el.utils.gumbel_softmax_trick()function in the generative model
In [2]:
Copied!
help(el.utils.gumbel_softmax_trick)
help(el.utils.gumbel_softmax_trick)
Help on function gumbel_softmax_trick in module elicito.utils:
gumbel_softmax_trick(likelihood: Any, upper_thres: float, temp: float = 1.6) -> Any
Apply softmax-gumble trick
The softmax-gumbel trick computes a continuous approximation of ypred from
a discrete likelihood and thus allows for the computation of gradients for
discrete random variables.
Currently, this approach is only implemented for models without upper
boundary (e.g., Poisson model).
References
----------
- Maddison, C. J., Mnih, A. & Teh, Y. W. The concrete distribution:
A continuous relaxation of discrete random variables in International
Conference on Learning Representations (2017).
https://doi.org/10.48550/arXiv.1611.00712
- Jang, E., Gu, S. & Poole, B. Categorical reparameterization with
gumbel-softmax in International Conference on Learning Representations
(2017). https://openreview.net/forum?id=rkE3y85ee.
- Joo, W., Kim, D., Shin, S. & Moon, I.-C. Generalized gumbel-softmax
gradient estimator for generic discrete random variables. Preprint
at https://doi.org/10.48550/arXiv.2003.01847 (2020).
Parameters
----------
likelihood
shape = [B, num_samples, num_obs, 1]
likelihood function used in the generative model.
Must be a tfp.distributions object.
upper_thres
upper threshold at which the distribution of the outcome variable is
truncated. For double-bounded distribution (e.g. Binomial) this is
simply the "total count" information. Lower-bounded distribution
(e.g. Poisson) must be truncated to create an artificial
double-boundedness.
temp
temperature hyperparameter of softmax function. A temperature going
towards zero yields approximates a categorical distribution, while
a temperature >> 0 approximates a continuous distribution.
Returns
-------
ypred :
continuously approximated ypred from the discrete likelihood.
Raise
-----
ValueError
if rank of ``likelihood`` is not 4. The shape of the likelihood obj
must have an extra final dimension, i.e., (B, num_samples, num_obs, 1),
for the softmax-gumbel computation. Use for example
``tf.expand_dims(mu,-1)`` for expanding the batch-shape of the
likelihood.
if likelihood is not in tfp.distributions module. The likelihood
must be a tfp.distributions object.
The generative model¶
In [3]:
Copied!
class ToyModel:
"""
generative model with discrete likelihood
"""
def __call__(
self, prior_samples: Any, design_matrix: Any, total_count: int, temp: float
) -> dict[str, Any]:
"""
Sample from the generative model
Parameters
----------
prior_samples
samples from the prior distribution
design_matrix
design matrix
total_count
number of trials per draw from the Binomial
temp
temperature parameter for softmax gumbel trick
Returns
-------
:
dictionary with target quantities
"""
B = prior_samples.shape[0]
S = prior_samples.shape[1]
# preprocess shape of design matrix
X = tf.broadcast_to(design_matrix[None, None, :], (B, S, len(design_matrix)))
# linear predictor (= mu)
epred = tf.add(
prior_samples[:, :, 0][:, :, None],
tf.multiply(prior_samples[:, :, 1][:, :, None], X),
)
# data-generating model
likelihood = tfd.Binomial(
total_count=total_count, probs=tf.math.sigmoid(tf.expand_dims(epred, -1))
)
# prior predictive distribution
ypred = el.utils.gumbel_softmax_trick(likelihood, total_count, temp)
# selected observations
y_X0, y_X1, y_X2 = (ypred[:, :, 0], ypred[:, :, 1], ypred[:, :, 2])
return dict(y_X0=y_X0, y_X1=y_X1, y_X2=y_X2)
class ToyModel:
"""
generative model with discrete likelihood
"""
def __call__(
self, prior_samples: Any, design_matrix: Any, total_count: int, temp: float
) -> dict[str, Any]:
"""
Sample from the generative model
Parameters
----------
prior_samples
samples from the prior distribution
design_matrix
design matrix
total_count
number of trials per draw from the Binomial
temp
temperature parameter for softmax gumbel trick
Returns
-------
:
dictionary with target quantities
"""
B = prior_samples.shape[0]
S = prior_samples.shape[1]
# preprocess shape of design matrix
X = tf.broadcast_to(design_matrix[None, None, :], (B, S, len(design_matrix)))
# linear predictor (= mu)
epred = tf.add(
prior_samples[:, :, 0][:, :, None],
tf.multiply(prior_samples[:, :, 1][:, :, None], X),
)
# data-generating model
likelihood = tfd.Binomial(
total_count=total_count, probs=tf.math.sigmoid(tf.expand_dims(epred, -1))
)
# prior predictive distribution
ypred = el.utils.gumbel_softmax_trick(likelihood, total_count, temp)
# selected observations
y_X0, y_X1, y_X2 = (ypred[:, :, 0], ypred[:, :, 1], ypred[:, :, 2])
return dict(y_X0=y_X0, y_X1=y_X1, y_X2=y_X2)
Construct the predictor¶
In [4]:
Copied!
# numeric, standardized predictor
def std_predictor(N: int, quantiles: list[float]) -> np.ndarray:
"""
Compute the design matrix for the numeric predictor
Parameters
----------
N
number of observations
quantiles
list of quantiles
Returns
-------
:
design matrix
"""
X = tf.cast(np.arange(N), tf.float32)
X_std = (X - tf.reduce_mean(X)) / tf.math.reduce_std(X)
X_sel = tfp.stats.percentile(X_std, [int(p * 100) for p in quantiles])
return X_sel
design_matrix = std_predictor(N=200, quantiles=[0.25, 0.50, 0.75])
design_matrix.numpy()
# numeric, standardized predictor
def std_predictor(N: int, quantiles: list[float]) -> np.ndarray:
"""
Compute the design matrix for the numeric predictor
Parameters
----------
N
number of observations
quantiles
list of quantiles
Returns
-------
:
design matrix
"""
X = tf.cast(np.arange(N), tf.float32)
X_std = (X - tf.reduce_mean(X)) / tf.math.reduce_std(X)
X_sel = tfp.stats.percentile(X_std, [int(p * 100) for p in quantiles])
return X_sel
design_matrix = std_predictor(N=200, quantiles=[0.25, 0.50, 0.75])
design_matrix.numpy()
Out[4]:
array([-0.85737586, 0.00866036, 0.85737586], dtype=float32)
Oracle as expert¶
- we simulate from a ground truth to obtain the expert data
$$ \begin{align*} \beta_0 &\sim \text{Normal}(0.1, 0.4) \\ \beta_1 &\sim \text{Normal}(0.2, 0.2) \end{align*} $$
In [5]:
Copied!
ground_truth = {
"beta0": tfd.Normal(loc=0.1, scale=0.4),
"beta1": tfd.Normal(loc=0.2, scale=0.2),
}
ground_truth = {
"beta0": tfd.Normal(loc=0.1, scale=0.4),
"beta1": tfd.Normal(loc=0.2, scale=0.2),
}
In [6]:
Copied!
eliobj = el.Elicit(
model=el.model(obj=ToyModel, design_matrix=design_matrix, total_count=30, temp=1.6),
parameters=[
el.parameter(
name="beta0",
family=tfd.Normal,
hyperparams=dict(loc=el.hyper("mu0"), scale=el.hyper("sigma0", lower=0)),
),
el.parameter(
name="beta1",
family=tfd.Normal,
hyperparams=dict(loc=el.hyper("mu1"), scale=el.hyper("sigma1", lower=0)),
),
],
targets=[
el.target(
name=f"y_X{i}",
query=el.queries.quantiles((0.05, 0.25, 0.50, 0.75, 0.95)),
loss=el.losses.MMD2(kernel="energy"),
weight=1.0,
)
for i in range(3)
],
expert=el.expert.simulator(ground_truth=ground_truth, num_samples=10_000),
optimizer=el.optimizer(
optimizer=tf.keras.optimizers.Adam, learning_rate=0.05, clipnorm=1.0
),
trainer=el.trainer(method="parametric_prior", seed=0, epochs=300, progress=0),
initializer=el.initializer(
method="sobol",
iterations=32,
distribution=el.initialization.uniform(radius=1.0, mean=0.0),
),
)
eliobj = el.Elicit(
model=el.model(obj=ToyModel, design_matrix=design_matrix, total_count=30, temp=1.6),
parameters=[
el.parameter(
name="beta0",
family=tfd.Normal,
hyperparams=dict(loc=el.hyper("mu0"), scale=el.hyper("sigma0", lower=0)),
),
el.parameter(
name="beta1",
family=tfd.Normal,
hyperparams=dict(loc=el.hyper("mu1"), scale=el.hyper("sigma1", lower=0)),
),
],
targets=[
el.target(
name=f"y_X{i}",
query=el.queries.quantiles((0.05, 0.25, 0.50, 0.75, 0.95)),
loss=el.losses.MMD2(kernel="energy"),
weight=1.0,
)
for i in range(3)
],
expert=el.expert.simulator(ground_truth=ground_truth, num_samples=10_000),
optimizer=el.optimizer(
optimizer=tf.keras.optimizers.Adam, learning_rate=0.05, clipnorm=1.0
),
trainer=el.trainer(method="parametric_prior", seed=0, epochs=300, progress=0),
initializer=el.initializer(
method="sobol",
iterations=32,
distribution=el.initialization.uniform(radius=1.0, mean=0.0),
),
)
Train the eliobj¶
In [7]:
Copied!
eliobj.fit()
eliobj.fit()
In [8]:
Copied!
el.plots.loss(eliobj, figsize=(6, 2))
el.plots.loss(eliobj, figsize=(6, 2))
Out[8]:
(<Figure size 600x200 with 2 Axes>,
array([<Axes: title={'center': 'total loss'}, xlabel='epochs'>,
<Axes: title={'center': 'weighted individual losses'}, xlabel='epochs'>],
dtype=object))
Convergence - Hyperparameter¶
In [9]:
Copied!
el.plots.hyperparameter(eliobj, figsize=(6, 2))
el.plots.hyperparameter(eliobj, figsize=(6, 2))
Out[9]:
(<Figure size 600x200 with 4 Axes>,
array([<Axes: title={'center': 'sigma0'}, xlabel='epochs'>,
<Axes: title={'center': 'sigma1'}, xlabel='epochs'>,
<Axes: title={'center': 'mu1'}, xlabel='epochs'>,
<Axes: title={'center': 'mu0'}, xlabel='epochs'>], dtype=object))
Expert data (oracle) vs. simulated data¶
In [10]:
Copied!
el.plots.elicits(eliobj, cols=3, figsize=(6, 2))
el.plots.elicits(eliobj, cols=3, figsize=(6, 2))
Out[10]:
(<Figure size 600x200 with 3 Axes>,
array([<Axes: title={'center': 'quantiles_y_X0'}, xlabel='expert', ylabel='model-sim.'>,
<Axes: title={'center': 'quantiles_y_X1'}, xlabel='expert', ylabel='model-sim.'>,
<Axes: title={'center': 'quantiles_y_X2'}, xlabel='expert', ylabel='model-sim.'>],
dtype=object))
Learned prior distributions¶
In [11]:
Copied!
el.plots.prior_marginals(eliobj, figsize=(6, 2))
el.plots.prior_marginals(eliobj, figsize=(6, 2))
INFO: Reset cols=2
Out[11]:
(<Figure size 600x200 with 2 Axes>,
array([<Axes: title={'center': 'beta0'}, xlabel='θ', ylabel='density'>,
<Axes: title={'center': 'beta1'}, xlabel='θ', ylabel='density'>],
dtype=object))