Using discrete random variables as likelihood¶

Here we introduce how to specify the generative model when a discrete likelihood is used.

Imports¶

In [1]:

Copied!

import os

os.environ["TF_ENABLE_ONEDNN_OPTS"] = "0"

from typing import Any

import numpy as np
import tensorflow as tf
import tensorflow_probability as tfp

import elicito as el

tfd = tfp.distributions
import os

os.environ["TF_ENABLE_ONEDNN_OPTS"] = "0"

from typing import Any

import numpy as np
import tensorflow as tf
import tensorflow_probability as tfp

import elicito as el

tfd = tfp.distributions

2025-11-12 14:14:17.366822: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2025-11-12 14:14:17.412961: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.

2025-11-12 14:14:19.069331: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.

2025-11-12 14:14:20.716784: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)

Gradients for discrete RVs¶

Challenge¶

we can't compute gradients for discrete random variables (RVs)
possible solution: gumbel-softmax trick see Jang et al. (2016)

Jang, E., Gu, S., & Poole, B. (2016). Categorical reparameterization with gumbel-softmax. https://doi.org/10.48550/arXiv.1611.01144.

Example: Binomial model¶

$$ \begin{align*} \beta_0 &\sim \text{Normal}(\mu_0, \sigma_0) &\text{priors}\\ \beta_1 &\sim \text{Normal}(\mu_1, \sigma_1) &\\ \mu &= \text{sigmoid}(\beta_0 + \beta_1X) &\text{link+linear predictor} \\ y_i &\sim \text{Binomial}(\mu, N) &\text{likelihood} \end{align*} $$

using the el.utils.gumbel_softmax_trick() function in the generative model

In [2]:

Copied!

help(el.utils.gumbel_softmax_trick)
help(el.utils.gumbel_softmax_trick)

Help on function gumbel_softmax_trick in module elicito.utils:

gumbel_softmax_trick(likelihood: Any, upper_thres: float, temp: float = 1.6) -> Any
    Apply softmax-gumble trick
    
    The softmax-gumbel trick computes a continuous approximation of ypred from
    a discrete likelihood and thus allows for the computation of gradients for
    discrete random variables.
    
    Currently, this approach is only implemented for models without upper
    boundary (e.g., Poisson model).
    
    References
    ----------
    - Maddison, C. J., Mnih, A. & Teh, Y. W. The concrete distribution:
      A continuous relaxation of discrete random variables in International
      Conference on Learning Representations (2017).
      https://doi.org/10.48550/arXiv.1611.00712
    - Jang, E., Gu, S. & Poole, B. Categorical reparameterization with
      gumbel-softmax in International Conference on Learning Representations
      (2017). https://openreview.net/forum?id=rkE3y85ee.
    - Joo, W., Kim, D., Shin, S. & Moon, I.-C. Generalized gumbel-softmax
      gradient estimator for generic discrete random variables. Preprint
      at https://doi.org/10.48550/arXiv.2003.01847 (2020).
    
    Parameters
    ----------
    likelihood
        shape = [B, num_samples, num_obs, 1]
        likelihood function used in the generative model.
        Must be a tfp.distributions object.
    
    upper_thres
        upper threshold at which the distribution of the outcome variable is
        truncated. For double-bounded distribution (e.g. Binomial) this is
        simply the "total count" information. Lower-bounded distribution
        (e.g. Poisson) must be truncated to create an artificial
        double-boundedness.
    
    temp
        temperature hyperparameter of softmax function. A temperature going
        towards zero yields approximates a categorical distribution, while
        a temperature >> 0 approximates a continuous distribution.
    
    Returns
    -------
    ypred :
        continuously approximated ypred from the discrete likelihood.
    
    Raise
    -----
    ValueError
        if rank of ``likelihood`` is not 4. The shape of the likelihood obj
        must have an extra final dimension, i.e., (B, num_samples, num_obs, 1),
        for the softmax-gumbel computation. Use for example
        ``tf.expand_dims(mu,-1)`` for expanding the batch-shape of the
        likelihood.
    
        if likelihood is not in tfp.distributions module. The likelihood
        must be a tfp.distributions object.

The generative model¶

In [3]:

Copied!





class ToyModel:
    """
    generative model with discrete likelihood
    """

    def __call__(
        self, prior_samples: Any, design_matrix: Any, total_count: int, temp: float
    ) -> dict[str, Any]:
        """
        Sample from the generative model

        Parameters
        ----------
        prior_samples
            samples from the prior distribution

        design_matrix
            design matrix

        total_count
            number of trials per draw from the Binomial

        temp
            temperature parameter for softmax gumbel trick

        Returns
        -------
        :
            dictionary with target quantities
        """
        B = prior_samples.shape[0]
        S = prior_samples.shape[1]

        # preprocess shape of design matrix
        X = tf.broadcast_to(design_matrix[None, None, :], (B, S, len(design_matrix)))
        # linear predictor (= mu)
        epred = tf.add(
            prior_samples[:, :, 0][:, :, None],
            tf.multiply(prior_samples[:, :, 1][:, :, None], X),
        )
        # data-generating model
        likelihood = tfd.Binomial(
            total_count=total_count, probs=tf.math.sigmoid(tf.expand_dims(epred, -1))
        )

        # prior predictive distribution
        ypred = el.utils.gumbel_softmax_trick(likelihood, total_count, temp)

        # selected observations
        y_X0, y_X1, y_X2 = (ypred[:, :, 0], ypred[:, :, 1], ypred[:, :, 2])

        return dict(y_X0=y_X0, y_X1=y_X1, y_X2=y_X2)
class ToyModel:
    """
    generative model with discrete likelihood
    """

    def __call__(
        self, prior_samples: Any, design_matrix: Any, total_count: int, temp: float
    ) -> dict[str, Any]:
        """
        Sample from the generative model

        Parameters
        ----------
        prior_samples
            samples from the prior distribution

        design_matrix
            design matrix

        total_count
            number of trials per draw from the Binomial

        temp
            temperature parameter for softmax gumbel trick

        Returns
        -------
        :
            dictionary with target quantities
        """
        B = prior_samples.shape[0]
        S = prior_samples.shape[1]

        # preprocess shape of design matrix
        X = tf.broadcast_to(design_matrix[None, None, :], (B, S, len(design_matrix)))
        # linear predictor (= mu)
        epred = tf.add(
            prior_samples[:, :, 0][:, :, None],
            tf.multiply(prior_samples[:, :, 1][:, :, None], X),
        )
        # data-generating model
        likelihood = tfd.Binomial(
            total_count=total_count, probs=tf.math.sigmoid(tf.expand_dims(epred, -1))
        )

        # prior predictive distribution
        ypred = el.utils.gumbel_softmax_trick(likelihood, total_count, temp)

        # selected observations
        y_X0, y_X1, y_X2 = (ypred[:, :, 0], ypred[:, :, 1], ypred[:, :, 2])

        return dict(y_X0=y_X0, y_X1=y_X1, y_X2=y_X2)

Construct the predictor¶

In [4]:

Copied!





# numeric, standardized predictor
def std_predictor(N: int, quantiles: list[float]) -> np.ndarray:
    """
    Compute the design matrix for the numeric predictor

    Parameters
    ----------
    N
        number of observations

    quantiles
        list of quantiles

    Returns
    -------
    :
        design matrix
    """
    X = tf.cast(np.arange(N), tf.float32)
    X_std = (X - tf.reduce_mean(X)) / tf.math.reduce_std(X)
    X_sel = tfp.stats.percentile(X_std, [int(p * 100) for p in quantiles])
    return X_sel


design_matrix = std_predictor(N=200, quantiles=[0.25, 0.50, 0.75])
design_matrix.numpy()
# numeric, standardized predictor
def std_predictor(N: int, quantiles: list[float]) -> np.ndarray:
    """
    Compute the design matrix for the numeric predictor

    Parameters
    ----------
    N
        number of observations

    quantiles
        list of quantiles

    Returns
    -------
    :
        design matrix
    """
    X = tf.cast(np.arange(N), tf.float32)
    X_std = (X - tf.reduce_mean(X)) / tf.math.reduce_std(X)
    X_sel = tfp.stats.percentile(X_std, [int(p * 100) for p in quantiles])
    return X_sel


design_matrix = std_predictor(N=200, quantiles=[0.25, 0.50, 0.75])
design_matrix.numpy()

Out[4]:

array([-0.85737586,  0.00866036,  0.85737586], dtype=float32)

Oracle as expert¶

we simulate from a ground truth to obtain the expert data

$$ \begin{align*} \beta_0 &\sim \text{Normal}(0.1, 0.4) \\ \beta_1 &\sim \text{Normal}(0.2, 0.2) \end{align*} $$

In [5]:

Copied!





ground_truth = {
    "beta0": tfd.Normal(loc=0.1, scale=0.4),
    "beta1": tfd.Normal(loc=0.2, scale=0.2),
}
ground_truth = {
    "beta0": tfd.Normal(loc=0.1, scale=0.4),
    "beta1": tfd.Normal(loc=0.2, scale=0.2),
}

Parametric approach¶

Specify the eliobj¶

In [6]:

Copied!





eliobj = el.Elicit(
    model=el.model(obj=ToyModel, design_matrix=design_matrix, total_count=30, temp=1.6),
    parameters=[
        el.parameter(
            name="beta0",
            family=tfd.Normal,
            hyperparams=dict(loc=el.hyper("mu0"), scale=el.hyper("sigma0", lower=0)),
        ),
        el.parameter(
            name="beta1",
            family=tfd.Normal,
            hyperparams=dict(loc=el.hyper("mu1"), scale=el.hyper("sigma1", lower=0)),
        ),
    ],
    targets=[
        el.target(
            name=f"y_X{i}",
            query=el.queries.quantiles((0.05, 0.25, 0.50, 0.75, 0.95)),
            loss=el.losses.MMD2(kernel="energy"),
            weight=1.0,
        )
        for i in range(3)
    ],
    expert=el.expert.simulator(ground_truth=ground_truth, num_samples=10_000),
    optimizer=el.optimizer(
        optimizer=tf.keras.optimizers.Adam, learning_rate=0.05, clipnorm=1.0
    ),
    trainer=el.trainer(method="parametric_prior", seed=0, epochs=300, progress=0),
    initializer=el.initializer(
        method="sobol",
        iterations=32,
        distribution=el.initialization.uniform(radius=1.0, mean=0.0),
    ),
)
eliobj = el.Elicit(
    model=el.model(obj=ToyModel, design_matrix=design_matrix, total_count=30, temp=1.6),
    parameters=[
        el.parameter(
            name="beta0",
            family=tfd.Normal,
            hyperparams=dict(loc=el.hyper("mu0"), scale=el.hyper("sigma0", lower=0)),
        ),
        el.parameter(
            name="beta1",
            family=tfd.Normal,
            hyperparams=dict(loc=el.hyper("mu1"), scale=el.hyper("sigma1", lower=0)),
        ),
    ],
    targets=[
        el.target(
            name=f"y_X{i}",
            query=el.queries.quantiles((0.05, 0.25, 0.50, 0.75, 0.95)),
            loss=el.losses.MMD2(kernel="energy"),
            weight=1.0,
        )
        for i in range(3)
    ],
    expert=el.expert.simulator(ground_truth=ground_truth, num_samples=10_000),
    optimizer=el.optimizer(
        optimizer=tf.keras.optimizers.Adam, learning_rate=0.05, clipnorm=1.0
    ),
    trainer=el.trainer(method="parametric_prior", seed=0, epochs=300, progress=0),
    initializer=el.initializer(
        method="sobol",
        iterations=32,
        distribution=el.initialization.uniform(radius=1.0, mean=0.0),
    ),
)

Train the eliobj¶

In [7]:

Copied!

eliobj.fit()
eliobj.fit()

Results¶

Convergence - Loss¶

In [8]:

Copied!

el.plots.loss(eliobj, figsize=(6, 2))
el.plots.loss(eliobj, figsize=(6, 2))

Out[8]:

(<Figure size 600x200 with 2 Axes>,
 array([<Axes: title={'center': 'total loss'}, xlabel='epochs'>,
        <Axes: title={'center': 'weighted individual losses'}, xlabel='epochs'>],
       dtype=object))

No description has been provided for this image

Convergence - Hyperparameter¶

In [9]:

Copied!

el.plots.hyperparameter(eliobj, figsize=(6, 2))
el.plots.hyperparameter(eliobj, figsize=(6, 2))

Out[9]:

(<Figure size 600x200 with 4 Axes>,
 array([<Axes: title={'center': 'sigma0'}, xlabel='epochs'>,
        <Axes: title={'center': 'sigma1'}, xlabel='epochs'>,
        <Axes: title={'center': 'mu1'}, xlabel='epochs'>,
        <Axes: title={'center': 'mu0'}, xlabel='epochs'>], dtype=object))

Expert data (oracle) vs. simulated data¶

In [10]:

Copied!

el.plots.elicits(eliobj, cols=3, figsize=(6, 2))
el.plots.elicits(eliobj, cols=3, figsize=(6, 2))

Out[10]:

(<Figure size 600x200 with 3 Axes>,
 array([<Axes: title={'center': 'quantiles_y_X0'}, xlabel='expert', ylabel='model-sim.'>,
        <Axes: title={'center': 'quantiles_y_X1'}, xlabel='expert', ylabel='model-sim.'>,
        <Axes: title={'center': 'quantiles_y_X2'}, xlabel='expert', ylabel='model-sim.'>],
       dtype=object))

Learned prior distributions¶

In [11]:

Copied!

el.plots.prior_marginals(eliobj, figsize=(6, 2))
el.plots.prior_marginals(eliobj, figsize=(6, 2))

INFO: Reset cols=2

Out[11]:

(<Figure size 600x200 with 2 Axes>,
 array([<Axes: title={'center': 'beta0'}, xlabel='θ', ylabel='density'>,
        <Axes: title={'center': 'beta1'}, xlabel='θ', ylabel='density'>],
       dtype=object))