Synthesis Validation

For synthesis validation we have only feature based evaluation metrics. To access the

Feature-based

Quality

`pymdma.tabular.measures.synthesis_val.ImprovedPrecision`

Improved Precision Metric for accessing fidelity of generative models.

Objective: Fidelity

Parameters:

Name	Type	Description	Default
`k`	`int`	Number of nearest neighbors to consider in the hypersphere estimation. Defaults to 5.	`5`
`metric`	`str`	The metric to use when calculating distance between instances. For the available metrics, see the documentation of `sklearn.metrics.pairwise_distances`.	`"euclidean"`
`n_workers`	`int`	Number of workers for computing pairwise distances. Defaults to 4.	`4`
`**kwargs`		Additional keyword arguments for compatiblilty.	`{}`

References

Kynkaanniemi et al., Improved Precision and Recall Metric for Assessing Generative Models (2019). https://arxiv.org/abs/1904.06991

Code adapted from: improved-precision-and-recall-metric: Improved Precision and Recall Metric for Assessing Generative Models — Official TensorFlow Implementation. https://github.com/kynkaat/improved-precision-and-recall-metric

Hypersphere estimation code was taken from: generative-evaluation-prdc, Reliable Fidelity and Diversity Metrics for Generative Models. https://github.com/clovaai/generative-evaluation-prdc

Examples:

>>> improved_precision = ImprovedPrecision()
>>> real_features = np.random.rand(100, 100)
>>> fake_features = np.random.rand(100, 100)
>>> result: MetricResult = improved_precision.compute(real_features, fake_features)

Source code in src/pymdma/general/measures/prdc.py

class ImprovedPrecision(FeatureMetric):
    """Improved Precision Metric for accessing fidelity of generative models.

    **Objective**: Fidelity

    Parameters
    ----------
    k : int, optional
        Number of nearest neighbors to consider in the hypersphere estimation. Defaults to 5.
    metric : str, optional, default="euclidean"
        The metric to use when calculating distance between instances.
        For the available metrics, see the documentation of `sklearn.metrics.pairwise_distances`.
    n_workers : int, optional
        Number of workers for computing pairwise distances. Defaults to 4.
    **kwargs
        Additional keyword arguments for compatiblilty.

    References
    ----------
    Kynkaanniemi et al., Improved Precision and Recall Metric for Assessing Generative Models (2019).
    https://arxiv.org/abs/1904.06991

    Code adapted from:
    improved-precision-and-recall-metric: Improved Precision and Recall Metric for Assessing Generative Models — Official TensorFlow Implementation.
    https://github.com/kynkaat/improved-precision-and-recall-metric

    Hypersphere estimation code was taken from:
    generative-evaluation-prdc, Reliable Fidelity and Diversity Metrics for Generative Models.
    https://github.com/clovaai/generative-evaluation-prdc

    Examples
    --------
    >>> improved_precision = ImprovedPrecision()
    >>> real_features = np.random.rand(100, 100)
    >>> fake_features = np.random.rand(100, 100)
    >>> result: MetricResult = improved_precision.compute(real_features, fake_features)
    """

    reference_type = ReferenceType.DATASET
    evaluation_level = [EvaluationLevel.INSTANCE, EvaluationLevel.DATASET]
    metric_group = MetricGroup.QUALITY

    higher_is_better: bool = True
    min_value: float = 0.0
    max_value: float = 1.0

    def __init__(
        self,
        k: int = 5,
        metric: str = "euclidean",
        n_workers: int = 4,
        **kwargs,
    ):
        super().__init__(**kwargs)
        self.k = k
        self.metric = metric
        self.n_workers = n_workers

    def compute(self, real_features: np.ndarray, fake_features: np.ndarray, **kwargs) -> MetricResult:
        """Compute the Improved Precision metric.

        Parameters
        ----------
        real_features : np.ndarray
            Array of shape (n_samples, n_features) containing the real features.
        fake_features : np.ndarray
            Array of shape (n_samples, n_features) containing the fake features.

        Notes
        -----
        Intermediate computations can be stored in the `context` dictionary of the `kwargs` parameter.
        Usefull when calculating multiple metrics that share the same intermediate computations.

        Returns
        -------
        result: MetricResult
            Dataset-level and instance-level results for the precision metric.
        """
        state = kwargs.get("context", {})
        if "real_nn_distances" not in state:
            state["real_nn_distances"] = compute_nearest_neighbour_distances(
                real_features,
                nearest_k=self.k,
                metric=self.metric,
                n_workers=self.n_workers,
            )

        if "real_fake_distances" not in state:
            state["real_fake_distances"] = compute_pairwise_distance(
                real_features,
                fake_features,
                metric=self.metric,
                n_workers=self.n_workers,
            )

        precision = (
            np.logical_or(
                (state["real_fake_distances"] < np.expand_dims(state["real_nn_distances"], axis=1)),
                np.isclose(state["real_fake_distances"], np.expand_dims(state["real_nn_distances"], axis=1)),
            )
            .any(axis=0)
            .astype(int)
        )

        return MetricResult(
            dataset_level={"dtype": OutputsTypes.NUMERIC, "subtype": "float", "value": precision.mean()},
            instance_level={"dtype": OutputsTypes.ARRAY, "subtype": "int", "value": precision.tolist()},
        )

`pymdma.tabular.measures.synthesis_val.ImprovedRecall`

Improved Recall Metric for accessing diversity of generative models.

Objective: Diversity

Parameters:

Name	Type	Description	Default
`k`	`int`	Number of nearest neighbors to consider in the hypersphere estimation. Defaults to 5.	`5`
`metric`	`str`	The metric to use when calculating distance between instances. For the available metrics, see the documentation of `sklearn.metrics.pairwise_distances`.	`"euclidean"`
`n_workers`	`int`	Number of workers for computing pairwise distances. Defaults to 4.	`4`
`**kwargs`		Additional keyword arguments for compatiblilty.	`{}`

References

Kynkaanniemi et al., Improved Precision and Recall Metric for Assessing Generative Models (2019). https://arxiv.org/abs/1904.06991

Code adapted from: improved-precision-and-recall-metric: Improved Precision and Recall Metric for Assessing Generative Models — Official TensorFlow Implementation. https://github.com/kynkaat/improved-precision-and-recall-metric

Hypersphere estimation code was taken from: generative-evaluation-prdc, Reliable Fidelity and Diversity Metrics for Generative Models. https://github.com/clovaai/generative-evaluation-prdc

Examples:

>>> improved_recall = ImprovedRecall()
>>> real_features = np.random.rand(100, 100)
>>> fake_features = np.random.rand(100, 100)
>>> result: MetricResult = improved_recall.compute(real_features, fake_features)

Source code in src/pymdma/general/measures/prdc.py

class ImprovedRecall(FeatureMetric):
    """Improved Recall Metric for accessing diversity of generative models.

    **Objective**: Diversity

    Parameters
    ----------
    k : int, optional
        Number of nearest neighbors to consider in the hypersphere estimation. Defaults to 5.
    metric : str, optional, default="euclidean"
        The metric to use when calculating distance between instances.
        For the available metrics, see the documentation of `sklearn.metrics.pairwise_distances`.
    n_workers : int, optional
        Number of workers for computing pairwise distances. Defaults to 4.
    **kwargs
        Additional keyword arguments for compatiblilty.

    References
    ----------
    Kynkaanniemi et al., Improved Precision and Recall Metric for Assessing Generative Models (2019).
    https://arxiv.org/abs/1904.06991

    Code adapted from:
    improved-precision-and-recall-metric: Improved Precision and Recall Metric for Assessing Generative Models — Official TensorFlow Implementation.
    https://github.com/kynkaat/improved-precision-and-recall-metric

    Hypersphere estimation code was taken from:
    generative-evaluation-prdc, Reliable Fidelity and Diversity Metrics for Generative Models.
    https://github.com/clovaai/generative-evaluation-prdc

    Examples
    --------
    >>> improved_recall = ImprovedRecall()
    >>> real_features = np.random.rand(100, 100)
    >>> fake_features = np.random.rand(100, 100)
    >>> result: MetricResult = improved_recall.compute(real_features, fake_features)
    """

    reference_type = ReferenceType.DATASET
    evaluation_level = [EvaluationLevel.INSTANCE, EvaluationLevel.DATASET]
    metric_group = MetricGroup.QUALITY

    higher_is_better: bool = True
    min_value: float = 0.0
    max_value: float = 1.0

    def __init__(
        self,
        k: int = 5,
        metric: str = "euclidean",
        n_workers: int = 4,
        **kwargs,
    ):
        super().__init__(**kwargs)
        self.k = k
        self.metric = metric
        self.n_workers = n_workers

    def compute(self, real_features: np.ndarray, fake_features: np.ndarray, **kwargs) -> MetricResult:
        """Compute the Improved Recall metric.

        Parameters
        ----------
        real_features : np.ndarray
            Array of shape (n_samples, n_features) containing the real features.
        fake_features : np.ndarray
            Array of shape (n_samples, n_features) containing the fake features.

        Notes
        -----
        Intermediate computations can be stored in the `context` dictionary of the `kwargs` parameter.
        Usefull when calculating multiple metrics that share the same intermediate computations.

        Returns
        -------
        result: MetricResult
            Dataset-level and instance-level results for the recall metric.
        """
        state = kwargs.get("context", {})
        if "fake_nn_distances" not in state:
            state["fake_nn_distances"] = compute_nearest_neighbour_distances(
                fake_features,
                nearest_k=self.k,
                metric=self.metric,
                n_workers=self.n_workers,
            )

        if "real_fake_distances" not in state:
            state["real_fake_distances"] = compute_pairwise_distance(
                real_features,
                fake_features,
                metric=self.metric,
                n_workers=self.n_workers,
            )

        recall_mask = np.logical_or(
            state["real_fake_distances"] < np.expand_dims(state["fake_nn_distances"], axis=0),
            np.isclose(state["real_fake_distances"], np.expand_dims(state["fake_nn_distances"], axis=0)),
        )
        recall = recall_mask.any(axis=1).astype(int)

        # matrix with (R, F) shape -> .any() -> matrix with (F,) shape
        # an array that indicates for each F sample how many real samples are within its manifold
        recall_counts = recall_mask.sum(axis=0)

        return MetricResult(
            dataset_level={"dtype": OutputsTypes.NUMERIC, "subtype": "float", "value": recall.mean()},
            instance_level={"dtype": OutputsTypes.ARRAY, "subtype": "int", "value": recall_counts.tolist()},
        )

`pymdma.tabular.measures.synthesis_val.Density`

Density Metric for accessing fidelity of the generated samples. Unlike Improved Precision, it is robust towards outliers in the real/reference data.

Objective: Fidelity

Parameters:

Name	Type	Description	Default
`k`	`int`	Number of nearest neighbors to consider in the hypersphere estimation. Defaults to 5.	`5`
`metric`	`str`	The metric to use when calculating distance between instances. For the available metrics, see the documentation of `sklearn.metrics.pairwise_distances`.	`"euclidean"`
`n_workers`	`int`	Number of workers for computing pairwise distances. Defaults to 4.	`4`
`**kwargs`		Additional keyword arguments for compatibility.	`{}`

References

Naeem et al., Reliable Fidelity and Diversity Metrics for Generative Models (2020). https://arxiv.org/abs/2002.09797

Code was adapted from: generative-evaluation-prdc, Reliable Fidelity and Diversity Metrics for Generative Models. https://github.com/clovaai/generative-evaluation-prdc

Examples:

>>> density = Density()
>>> real_features = np.random.rand(100, 100)
>>> fake_features = np.random.rand(100, 100)
>>> result: MetricResult = density.compute(real_features, fake_features)

Source code in src/pymdma/general/measures/prdc.py

class Density(FeatureMetric):
    """Density Metric for accessing fidelity of the generated samples. Unlike
    Improved Precision, it is robust towards outliers in the real/reference
    data.

    **Objective**: Fidelity

    Parameters
    ----------
    k : int, optional
        Number of nearest neighbors to consider in the hypersphere estimation. Defaults to 5.
    metric : str, optional, default="euclidean"
        The metric to use when calculating distance between instances.
        For the available metrics, see the documentation of `sklearn.metrics.pairwise_distances`.
    n_workers : int, optional
        Number of workers for computing pairwise distances. Defaults to 4.
    **kwargs
        Additional keyword arguments for compatibility.

    References
    ----------
    Naeem et al., Reliable Fidelity and Diversity Metrics for Generative Models (2020).
    https://arxiv.org/abs/2002.09797

    Code was adapted from:
    generative-evaluation-prdc, Reliable Fidelity and Diversity Metrics for Generative Models.
    https://github.com/clovaai/generative-evaluation-prdc

    Examples
    --------
    >>> density = Density()
    >>> real_features = np.random.rand(100, 100)
    >>> fake_features = np.random.rand(100, 100)
    >>> result: MetricResult = density.compute(real_features, fake_features)
    """

    reference_type = ReferenceType.DATASET
    evaluation_level = [EvaluationLevel.INSTANCE, EvaluationLevel.DATASET]
    metric_group = MetricGroup.QUALITY

    higher_is_better: bool = True
    min_value: float = 0.0
    max_value: float = 1.0

    def __init__(
        self,
        k: int = 5,
        metric: str = "euclidean",
        n_workers: int = 4,
        **kwargs,
    ):
        super().__init__(**kwargs)
        self.k = k
        self.metric = metric
        self.n_workers = n_workers

    def compute(self, real_features: np.ndarray, fake_features: np.ndarray, **kwargs) -> MetricResult:
        """Compute the Density metric.

        Parameters
        ----------
        real_features : np.ndarray
            Array of shape (n_samples, n_features) containing the real features.
        fake_features : np.ndarray
            Array of shape (n_samples, n_features) containing the fake features.

        Notes
        -----
        Intermediate computations can be stored in the `context` dictionary of the `kwargs` parameter.
        Usefull when calculating multiple metrics that share the same intermediate computations.

        Returns
        -------
        result: MetricResult
            Dataset-level and instance-level results for the density metric.
        """
        state = kwargs.get("context", {})
        if "real_nn_distances" not in state:
            state["real_nn_distances"] = compute_nearest_neighbour_distances(
                real_features,
                nearest_k=self.k,
                metric=self.metric,
                n_workers=self.n_workers,
            )

        if "real_fake_distances" not in state:
            state["real_fake_distances"] = compute_pairwise_distance(
                real_features,
                fake_features,
                metric=self.metric,
                n_workers=self.n_workers,
            )

        density = np.logical_or(
            (state["real_fake_distances"] < np.expand_dims(state["real_nn_distances"], axis=1)),
            np.isclose(state["real_fake_distances"], np.expand_dims(state["real_nn_distances"], axis=1)),
        )
        density = (1.0 / float(self.k)) * density.sum(axis=0)

        return MetricResult(
            dataset_level={"dtype": OutputsTypes.NUMERIC, "subtype": "float", "value": density.mean()},
            instance_level={"dtype": OutputsTypes.ARRAY, "subtype": "float", "value": density.tolist()},
        )

`pymdma.tabular.measures.synthesis_val.Coverage`

Coverage Metric for accessing diversity of the generated samples. Unlike Improved Recall, it is robust towards outliers in the real/reference data.

Objective: Diversity

Parameters:

Name	Type	Description	Default
`k`	`int`	Number of nearest neighbors to consider in the hypersphere estimation. Defaults to 5.	`5`
`metric`	`str`	The metric to use when calculating distance between instances. For the available metrics, see the documentation of `sklearn.metrics.pairwise_distances`.	`"euclidean"`
`n_workers`	`int`	Number of workers for computing pairwise distances. Defaults to 4.	`4`
`**kwargs`		Additional keyword arguments for compatibility.	`{}`

References

Naeem et al., Reliable Fidelity and Diversity Metrics for Generative Models (2020). https://arxiv.org/abs/2002.09797

Code was adapted from: generative-evaluation-prdc, Reliable Fidelity and Diversity Metrics for Generative Models. https://github.com/clovaai/generative-evaluation-prdc

Examples:

>>> coverage = Coverage()
>>> real_features = np.random.rand(100, 100)
>>> fake_features = np.random.rand(100, 100)
>>> result: MetricResult = coverage.compute(real_features, fake_features)

Source code in src/pymdma/general/measures/prdc.py

class Coverage(FeatureMetric):
    """Coverage Metric for accessing diversity of the generated samples. Unlike
    Improved Recall, it is robust towards outliers in the real/reference data.

    **Objective**: Diversity

    Parameters
    ----------
    k : int, optional
        Number of nearest neighbors to consider in the hypersphere estimation. Defaults to 5.
    metric : str, optional, default="euclidean"
        The metric to use when calculating distance between instances.
        For the available metrics, see the documentation of `sklearn.metrics.pairwise_distances`.
    n_workers : int, optional
        Number of workers for computing pairwise distances. Defaults to 4.
    **kwargs
        Additional keyword arguments for compatibility.

    References
    ----------
    Naeem et al., Reliable Fidelity and Diversity Metrics for Generative Models (2020).
    https://arxiv.org/abs/2002.09797

    Code was adapted from:
    generative-evaluation-prdc, Reliable Fidelity and Diversity Metrics for Generative Models.
    https://github.com/clovaai/generative-evaluation-prdc

    Examples
    --------
    >>> coverage = Coverage()
    >>> real_features = np.random.rand(100, 100)
    >>> fake_features = np.random.rand(100, 100)
    >>> result: MetricResult = coverage.compute(real_features, fake_features)
    """

    reference_type = ReferenceType.DATASET
    evaluation_level = [EvaluationLevel.INSTANCE, EvaluationLevel.DATASET]
    metric_group = MetricGroup.QUALITY

    higher_is_better: bool = True
    min_value: float = 0.0
    max_value: float = 1.0

    def __init__(
        self,
        k: int = 5,
        metric: str = "euclidean",
        n_workers: int = 4,
        **kwargs,
    ):
        super().__init__(**kwargs)
        self.k = k
        self.metric = metric
        self.n_workers = n_workers

    def compute(self, real_features: np.ndarray, fake_features: np.ndarray, **kwargs) -> MetricResult:
        """Compute the Coverage metric.

        Parameters
        ----------
        real_features : np.ndarray
            Array of shape (n_samples, n_features) containing the real features.
        fake_features : np.ndarray
            Array of shape (n_samples, n_features) containing the fake features.

        Notes
        -----
        Intermediate computations can be stored in the `context` dictionary of the `kwargs` parameter.
        Usefull when calculating multiple metrics that share the same intermediate computations.

        Returns
        -------
        result: MetricResult
            Dataset-level and instance-level results for the coverage metric.
        """
        state = kwargs.get("context", {})
        if "real_nn_distances" not in state:
            state["real_nn_distances"] = compute_nearest_neighbour_distances(
                real_features,
                nearest_k=self.k,
                metric=self.metric,
                n_workers=self.n_workers,
            )

        if "real_fake_distances" not in state:
            state["real_fake_distances"] = compute_pairwise_distance(
                real_features,
                fake_features,
                metric=self.metric,
                n_workers=self.n_workers,
            )

        coverage = np.logical_or(
            state["real_fake_distances"].min(axis=1) < state["real_nn_distances"],
            np.isclose(state["real_fake_distances"].min(axis=1), state["real_nn_distances"]),
        )

        # matrix with (R, F) shape -> .any() -> matrix with (F,) shape
        # an array that indicates for each F in how many real manifolds it is contained in
        coverage_counts = np.logical_or(
            state["real_fake_distances"] < np.expand_dims(state["real_nn_distances"], axis=1),
            np.isclose(state["real_fake_distances"], np.expand_dims(state["real_nn_distances"], axis=1)),
        ).sum(axis=0)

        return MetricResult(
            dataset_level={"dtype": OutputsTypes.NUMERIC, "subtype": "float", "value": coverage.mean()},
            instance_level={"dtype": OutputsTypes.ARRAY, "subtype": "int", "value": coverage_counts.tolist()},
        )

`pymdma.tabular.measures.synthesis_val.StatisticalSimScore`

Computes a dataset-level statistical similarity score between real and synthetic data.

This metric assesses how closely the statistical properties of the synthetic dataset resemble those of the real dataset, providing a fidelity measure for synthetic data generation.

Objective: Fidelity

Parameters:

Name	Type	Description	Default
`col_map`	`dict`	A mapping of column names to their types and properties. This is used to determine how to compute similarity for each column.	`None`
`**kwargs`	`dict`	Additional keyword arguments passed to the parent class.	`{}`

References

Yang et al., Structured evaluation of synthetic tabular data (2024). https://arxiv.org/abs/2403.10424

Returns:

Type	Description
`MetricResult`	A MetricResult object containing the similarity scores and their statistics.

Examples:

>>> # Example 1: Evaluating statistical similarity for a dataset with discrete and continuous variables
>>> import numpy as np
>>> real_data = np.array([
...     [1, 2.5],
...     [1, 3.0],
...     [2, 3.5],
...     [2, 4.0]
... ])
>>> syn_data = np.array([
...     [1, 2.6],
...     [1, 3.1],
...     [2, 3.4],
...     [2, 4.2]
... ])
>>> col_map = {
...     "column1": {"type": {"tag": "discrete"}},
...     "column2": {"type": {"tag": "continuous"}},
... }
>>> sim_score = StatisticalSimScore(col_map=col_map)
>>> result: MetricResult = sim_score.compute(real_data, syn_data)
>>> dataset_level, _ = result.value # Output: similarity scores for each column

>>> # Example 2: Evaluating similarity with mismatched column types
>>> real_data = np.array([
...     [1, 2],
...     [2, 3],
...     [3, 4]
... ])
>>> syn_data = np.array([
...     [1, 2],
...     [2, 3],
...     [3, 5]
... ])
>>> col_map = {
...     "column1": {"type": {"tag": "discrete"}},
...     "column2": {"type": {"tag": "discrete"}},
... }
>>> sim_score = StatisticalSimScore(col_map=col_map)
>>> result: MetricResult = sim_score.compute(real_data, syn_data)
>>> dataset_level, _ = result.value  # Output: similarity scores for each column
>>> dataset_stats, _ = result.stats  # Output: mean and std of similarity scores

Source code in src/pymdma/tabular/measures/synthesis_val/data/similarity.py

class StatisticalSimScore(Metric):
    """Computes a dataset-level statistical similarity score between real and
    synthetic data.

    This metric assesses how closely the statistical properties of the synthetic dataset
    resemble those of the real dataset, providing a fidelity measure for synthetic data generation.

    **Objective**: Fidelity

    Parameters
    ----------
    col_map : dict, optional, default=None
        A mapping of column names to their types and properties. This is used to determine
        how to compute similarity for each column.
    **kwargs : dict
        Additional keyword arguments passed to the parent class.

    References
    ----------
    Yang et al., Structured evaluation of synthetic tabular data (2024).
    https://arxiv.org/abs/2403.10424

    Returns
    -------
    MetricResult
        A MetricResult object containing the similarity scores and their statistics.

    Examples
    --------
    >>> # Example 1: Evaluating statistical similarity for a dataset with discrete and continuous variables
    >>> import numpy as np
    >>> real_data = np.array([
    ...     [1, 2.5],
    ...     [1, 3.0],
    ...     [2, 3.5],
    ...     [2, 4.0]
    ... ])
    >>> syn_data = np.array([
    ...     [1, 2.6],
    ...     [1, 3.1],
    ...     [2, 3.4],
    ...     [2, 4.2]
    ... ])
    >>> col_map = {
    ...     "column1": {"type": {"tag": "discrete"}},
    ...     "column2": {"type": {"tag": "continuous"}},
    ... }
    >>> sim_score = StatisticalSimScore(col_map=col_map)
    >>> result: MetricResult = sim_score.compute(real_data, syn_data)
    >>> dataset_level, _ = result.value # Output: similarity scores for each column

    >>> # Example 2: Evaluating similarity with mismatched column types
    >>> real_data = np.array([
    ...     [1, 2],
    ...     [2, 3],
    ...     [3, 4]
    ... ])
    >>> syn_data = np.array([
    ...     [1, 2],
    ...     [2, 3],
    ...     [3, 5]
    ... ])
    >>> col_map = {
    ...     "column1": {"type": {"tag": "discrete"}},
    ...     "column2": {"type": {"tag": "discrete"}},
    ... }
    >>> sim_score = StatisticalSimScore(col_map=col_map)
    >>> result: MetricResult = sim_score.compute(real_data, syn_data)
    >>> dataset_level, _ = result.value  # Output: similarity scores for each column
    >>> dataset_stats, _ = result.stats  # Output: mean and std of similarity scores
    """

    reference_type = ReferenceType.DATASET
    evaluation_level = EvaluationLevel.DATASET
    metric_group = MetricGroup.QUALITY

    higher_is_better: bool = True
    min_value: float = 0.0
    max_value: float = 1.0

    def __init__(
        self,
        col_map: Optional[Dict[str, Dict[str, str]]] = None,
        **kwargs,
    ):
        super().__init__(**kwargs)
        self.col_map = col_map

    def _stat_sim_1d(
        self,
        real_col: np.ndarray,
        syn_col: np.ndarray,
        kind: str = "discrete",
        **kwargs,
    ):
        """Computes an 1D statistical similarity.

        Parameters
        ----------
        real_col : np.ndarray
            The real data for the specific attribute.
        syn_col : np.ndarray
            The synthetic data for the specific attribute.
        kind : str
            The type of data ('discrete' or 'continuous').
        **kwargs : dict
            Additional keyword arguments for computation.

        Returns
        -------
        float
            The computed similarity score for the attribute.
        """
        # variable type assignment
        kind_ = kind.lower() if kind.lower() in ["discrete", "continuous"] else "continuous"

        # mapper
        kind_mapper = {
            "discrete": _get_tv_similarity,
            "continuous": _get_ks_similarity,
        }

        # score
        sim_score = kind_mapper.get(kind_)(
            real_col,
            syn_col,
        )
        return sim_score

    def compute(self, real_data: np.ndarray, syn_data: np.ndarray, **kwargs) -> MetricResult:
        """Computes the statistical similarity score between real and synthetic
        datasets.

        Parameters
        ----------
        real_data : np.ndarray
            The real dataset for comparison.
        syn_data : np.ndarray
            The synthetic dataset to evaluate.
        **kwargs : dict
            Additional keyword arguments for computation.

        Returns
        -------
        MetricResult
            A MetricResult object containing the similarity scores and their statistics.
        """

        # checkpoint
        assert real_data.shape[1] == syn_data.shape[1], "Mismatched columns. Please fix before computing metrics."

        # column map
        col_map_exists = isinstance(self.col_map, dict)
        cols = self.col_map.keys() if col_map_exists else [f"att_{idx}" for idx in range(real_data.shape[1])]

        # similarity map
        sim_score = {}

        # column similarity
        for idx, col in enumerate(cols):
            # continuous OR discrete
            if col_map_exists:
                # dtype
                vtype = self.col_map.get(col).get("type").get("tag")
            else:
                # dtype
                vtype = "discrete" if is_categorical(real_data[:, idx]) else "continuous"

            # compute similarity
            sim_ = self._stat_sim_1d(
                real_data[:, idx],
                syn_data[:, idx],
                kind=vtype,
            )

            # assign
            sim_score[col] = sim_

        # global scores
        global_d = {
            "mean": np.mean(list(sim_score.values())).round(2),
            "std": np.std(list(sim_score.values())).round(2),
        }

        return MetricResult(
            dataset_level={
                "dtype": OutputsTypes.KEY_VAL,
                "subtype": "float",
                "value": sim_score,
                "stats": global_d,
            },
        )

`pymdma.tabular.measures.synthesis_val.StatisticalDivergenceScore`

Computes a statistical divergence score for each column, specifically the Jensen-Shannon (JS) and Kullback-Leibler (KL) divergence scores.

Objective: Fidelity

Parameters:

Name	Type	Description	Default
`column_names`	`list of str`	List of the names of the columns (features) in the dataset.	`None`
`score`	`str`	Specifies the divergence score to compute ('js' for Jensen-Shannon, 'kl' for Kullback-Leibler, 'all' for both). By default, it is set to 'kl'.	`'kl'`
`**kwargs`	`dict`	Additional keyword arguments passed to the parent class.	`{}`

References

Fonseca and Bacao, Tabular and latent space synthetic data generation: a literature review (2023). https://doi.org/10.1186/s40537-023-00792-7

Returns:

Type	Description
`MetricResult`	A MetricResult object containing the divergence scores and their statistics.

Examples:

>>> # Example 1: Evaluating statistical divergence for a dataset
>>> import numpy as np
>>> real_data = np.array([
...     [1, 2, 3],
...     [2, 3, 4],
...     [3, 4, 5]
... ])
>>> syn_data = np.array([
...     [1, 2, 2],
...     [2, 2, 3],
...     [3, 3, 4]
... ])
>>> col_map = {
...     "column1": {"type": {"tag": "continuous"}},
...     "column2": {"type": {"tag": "continuous"}},
...     "column3": {"type": {"tag": "continuous"}},
... }
>>> divergence_score = StatisticalDivergenceScore(col_map=col_map, score='kl')
>>> result: MetricResult = divergence_score.compute(real_data, syn_data)
>>> dataset_level, _ = result.value  # Output: divergence scores for each column

>>> # Example 2: Using JS divergence instead of KL
>>> divergence_score_js = StatisticalDivergenceScore(col_map=col_map, score='js')
>>> result_js: MetricResult = divergence_score_js.compute(real_data, syn_data)
>>> dataset_level_js, _ = result_js.value  # Output: JS divergence scores for each column
>>> dataset_stats_js, _ = result_js.stats  # Output: mean and std of JS divergence scores

Source code in src/pymdma/tabular/measures/synthesis_val/data/similarity.py

class StatisticalDivergenceScore(Metric):
    """Computes a statistical divergence score for each column, specifically
    the Jensen-Shannon (JS) and Kullback-Leibler (KL) divergence scores.

    **Objective**: Fidelity

    Parameters
    ----------
    column_names : list of str, optional, default=None
        List of the names of the columns (features) in the dataset.
    score : str, optional, default='kl'
        Specifies the divergence score to compute ('js' for Jensen-Shannon, 'kl' for Kullback-Leibler, 'all' for both).
        By default, it is set to 'kl'.
    **kwargs : dict
        Additional keyword arguments passed to the parent class.

    References
    ----------
    Fonseca and Bacao,  Tabular and latent space synthetic data generation: a literature review (2023).
    https://doi.org/10.1186/s40537-023-00792-7

    Returns
    -------
    MetricResult
        A MetricResult object containing the divergence scores and their statistics.

    Examples
    --------
    >>> # Example 1: Evaluating statistical divergence for a dataset
    >>> import numpy as np
    >>> real_data = np.array([
    ...     [1, 2, 3],
    ...     [2, 3, 4],
    ...     [3, 4, 5]
    ... ])
    >>> syn_data = np.array([
    ...     [1, 2, 2],
    ...     [2, 2, 3],
    ...     [3, 3, 4]
    ... ])
    >>> col_map = {
    ...     "column1": {"type": {"tag": "continuous"}},
    ...     "column2": {"type": {"tag": "continuous"}},
    ...     "column3": {"type": {"tag": "continuous"}},
    ... }
    >>> divergence_score = StatisticalDivergenceScore(col_map=col_map, score='kl')
    >>> result: MetricResult = divergence_score.compute(real_data, syn_data)
    >>> dataset_level, _ = result.value  # Output: divergence scores for each column

    >>> # Example 2: Using JS divergence instead of KL
    >>> divergence_score_js = StatisticalDivergenceScore(col_map=col_map, score='js')
    >>> result_js: MetricResult = divergence_score_js.compute(real_data, syn_data)
    >>> dataset_level_js, _ = result_js.value  # Output: JS divergence scores for each column
    >>> dataset_stats_js, _ = result_js.stats  # Output: mean and std of JS divergence scores
    """

    reference_type = ReferenceType.DATASET
    evaluation_level = EvaluationLevel.DATASET
    metric_group = MetricGroup.QUALITY

    higher_is_better: bool = False
    min_value: float = -np.inf
    max_value: float = np.inf

    def __init__(
        self,
        column_names: Optional[List[str]] = None,
        score: Literal["js", "kl", "all"] = "kl",
        **kwargs,
    ):
        """Initializes the StatisticalDivergenceScore metric to evaluate the
        divergence between real and synthetic datasets based on defined column
        characteristics.

        Parameters
        ----------
        column_names : list of str, optional, default=None
            List of the names of the columns (features) in the dataset.
        score : str, optional, default='kl'
            Specifies the divergence score to compute ('js' for Jensen-Shannon,
            'kl' for Kullback-Leibler). Default is 'kl'.
        **kwargs : dict
            Additional keyword arguments passed to the parent class.
        """

        super().__init__(**kwargs)

        # column names
        self.column_names = column_names

        # score type
        self.score = score

    def _diverg_score_1d(
        self,
        real_col: np.ndarray,
        syn_col: np.ndarray,
        score: Literal["js", "kl", "all"] = "all",
        **kwargs,
    ):
        """Computes a column-level statistical divergence.

        Parameters
        ----------
        real_col : np.ndarray
            The real data for the specific attribute.
        syn_col : np.ndarray
            The synthetic data for the specific attribute.
        score : str
            The type of divergence to compute ('js' or 'kl').
        **kwargs : dict
            Additional keyword arguments for computation.

        Returns
        -------
        dict
            A dictionary containing the computed divergence scores.
        """

        # score map
        score_map = {"js": _get_js_divergence, "kl": _get_kl_divergence}

        # score tags
        score_tag = ["js", "kl"] if score.lower() == "all" else [score]

        # get probability distributions
        real_pdf, syn_pdf, bins = _get_nn_pdf(real_col, syn_col)

        # compute divergence scores
        div_score = {tag: score_map.get(tag)(real_pdf, syn_pdf) for tag in score_tag if tag in score_map.keys()}

        return div_score

    def compute(self, real_data: np.ndarray, syn_data: np.ndarray, **kwargs) -> MetricResult:
        """Computes the statistical divergence score between real and synthetic
        datasets.

        Parameters
        ----------
        real_data : np.ndarray
            The real dataset for comparison.
        syn_data : np.ndarray
            The synthetic dataset to evaluate.
        **kwargs : dict
            Additional keyword arguments for computation.

        Returns
        -------
        MetricResult
            A MetricResult object containing the divergence scores and their statistics.
        """

        # checkpoint
        assert real_data.shape[1] == syn_data.shape[1], "Mismatched columns. Please fix before computing metrics."

        # columns
        cols = (
            self.column_names
            if isinstance(self.column_names, list)
            else [f"att_{idx}" for idx in range(real_data.shape[1])]
        )

        # divergence map
        div_score = {}

        # column-wise
        for idx, col in enumerate(cols):
            # compute scores
            sim_ = self._diverg_score_1d(
                real_data[:, idx],
                syn_data[:, idx],
                score=self.score,
            )

            # assign
            div_score[col] = list(sim_.values())

        # global scores
        # auxiliary score dataframe
        aux_df = pd.DataFrame.from_dict(div_score.values())

        # aggregates
        mean_g, std_g = aux_df.mean(0).to_dict(), aux_df.std(0).to_dict()

        # global
        glob_d = {
            f"{self.score}_mean": mean_g[0],
            f"{self.score}_std": std_g[0],
        }

        return MetricResult(
            dataset_level={
                "dtype": OutputsTypes.KEY_ARRAY,
                "subtype": "float",
                "value": div_score,
                "stats": glob_d,
            },
        )

`pymdma.tabular.measures.synthesis_val.CoherenceScore`

Computes the coherence score between the correlation matrices of the target and synthetic datasets. A higher coherence score indicates better fidelity between the datasets in terms of their correlation structures.

Objective: Fidelity

Parameters:

Name	Type	Description	Default
`weights`	`ndarray`	Weights for the correlations, allowing for weighted contributions to the coherence score. If None, uniform weights are applied.	`None`
`corr_type`	`str`	The type of correlation to compute ('pearson' by default). Other types like 'spearman' may be supported depending on the implementation.	`'pearson'`
`**kwargs`	`dict`	Additional keyword arguments passed to the parent class.	`{}`

References

Yang et al., Structured evaluation of synthetic tabular data (2024). https://arxiv.org/abs/2403.10424

Returns:

Type	Description
`MetricResult`	A MetricResult object containing the coherence score.

Examples:

>>> # Example 1: Evaluating coherence score for a dataset
>>> import numpy as np
>>> real_data = np.array([
...     [1, 2, 3],
...     [2, 3, 4],
...     [3, 4, 5]
... ])
>>> syn_data = np.array([
...     [1, 2, 3],
...     [1, 2, 3],
...     [3, 4, 5]
... ])
>>> coherence_score = CoherenceScore(corr_type='pearson')
>>> result: MetricResult = coherence_score.compute(real_data, syn_data)
>>> dataset_level, _ = result.value  # Output: coherence score

>>> # Example 2: Evaluating with custom weights
>>> weights = np.array([0.5, 1.0, 1.5])  # Example weights
>>> coherence_score_weighted = CoherenceScore(weights=weights, corr_type='spearman')
>>> result_weighted: MetricResult = coherence_score_weighted.compute(real_data, syn_data)
>>> dataset_level, _ = result_weighted.value  # Output: weighted coherence score

Source code in src/pymdma/tabular/measures/synthesis_val/data/similarity.py

class CoherenceScore(Metric):
    """Computes the coherence score between the correlation matrices of the
    target and synthetic datasets. A higher coherence score indicates better
    fidelity between the datasets in terms of their correlation structures.

    **Objective**: Fidelity

    Parameters
    ----------
    weights : np.ndarray, optional, default=None
        Weights for the correlations, allowing for weighted contributions
        to the coherence score. If None, uniform weights are applied.
    corr_type : str, optional, default='pearson'
        The type of correlation to compute ('pearson' by default).
        Other types like 'spearman' may be supported depending on the implementation.
    **kwargs : dict
        Additional keyword arguments passed to the parent class.

    References
    ----------
    Yang et al., Structured evaluation of synthetic tabular data (2024).
    https://arxiv.org/abs/2403.10424

    Returns
    -------
    MetricResult
        A MetricResult object containing the coherence score.

    Examples
    --------
    >>> # Example 1: Evaluating coherence score for a dataset
    >>> import numpy as np
    >>> real_data = np.array([
    ...     [1, 2, 3],
    ...     [2, 3, 4],
    ...     [3, 4, 5]
    ... ])
    >>> syn_data = np.array([
    ...     [1, 2, 3],
    ...     [1, 2, 3],
    ...     [3, 4, 5]
    ... ])
    >>> coherence_score = CoherenceScore(corr_type='pearson')
    >>> result: MetricResult = coherence_score.compute(real_data, syn_data)
    >>> dataset_level, _ = result.value  # Output: coherence score

    >>> # Example 2: Evaluating with custom weights
    >>> weights = np.array([0.5, 1.0, 1.5])  # Example weights
    >>> coherence_score_weighted = CoherenceScore(weights=weights, corr_type='spearman')
    >>> result_weighted: MetricResult = coherence_score_weighted.compute(real_data, syn_data)
    >>> dataset_level, _ = result_weighted.value  # Output: weighted coherence score
    """

    reference_type = ReferenceType.DATASET
    evaluation_level = EvaluationLevel.DATASET
    metric_group = MetricGroup.QUALITY

    higher_is_better: bool = True
    min_value: float = 0.0
    max_value: float = 1.0

    def __init__(
        self,
        weights: Optional[np.ndarray] = None,
        corr_type: Optional[str] = "pearson",
        **kwargs,
    ):
        """Initializes the CoherenceScore metric to evaluate the coherence
        between the correlation matrices of real and synthetic datasets.

        Parameters
        ----------
        weights : np.ndarray, optional, default=None
            Weights for the correlations, allowing for weighted contributions
            to the coherence score. If None, uniform weights are applied.
        corr_type : str, optional, default=None
            The type of correlation to compute ('pearson' by default).
        **kwargs : dict
            Additional keyword arguments passed to the parent class.
        """

        super().__init__(**kwargs)

        # correlation type
        self.corr = corr_type

        # weights array
        self.weights = weights

    def compute(
        self,
        real_data: np.ndarray,
        syn_data: np.ndarray,
        **kwargs,
    ) -> MetricResult:
        """Computes the coherence score between the correlation matrices of
        real and synthetic datasets.

        Parameters
        ----------
        real_data : np.ndarray
            The real dataset for comparison.
        syn_data : np.ndarray
            The synthetic dataset to evaluate.
        **kwargs : dict
            Additional keyword arguments for computation.

        Returns
        -------
        MetricResult
            A MetricResult object containing the coherence score.
        """

        # compute correlation matrices
        real_corr = pd.DataFrame(real_data).corr(self.corr).replace(np.nan, 1).to_numpy()
        syn_corr = pd.DataFrame(syn_data).corr(self.corr).replace(np.nan, 1).to_numpy()

        # number columns
        n_cols = len(real_corr)

        # compute similarity between real and syn matrices
        delta_corr = np.abs(
            np.nan_to_num(real_corr) - np.nan_to_num(syn_corr),
        )  # differences

        # weight matrix
        id_mask = np.abs(np.identity(n_cols) - 1)

        if self.weights is not None:
            w_mask = np.array([self.weights] * n_cols) * id_mask
        else:
            w_mask = np.ones((n_cols, n_cols)) * id_mask

        # norm
        w_mask /= sum(w_mask)

        # correlation similarity (weighted avg.)
        corr_sim = np.sum(delta_corr * w_mask) / np.sum(w_mask)
        # ((n_cols * (n_cols - 1)))

        # average correlation
        avg_corr_coh = np.mean(np.round(1 - corr_sim / 2, 3))

        return MetricResult(
            dataset_level={
                "dtype": OutputsTypes.NUMERIC,
                "subtype": "float",
                "value": avg_corr_coh,
            },
        )

Privacy

`pymdma.tabular.measures.synthesis_val.Authenticity`

Authenticity Metric for assessing the authenticity of the generated samples. A synthetic sample is considered authentic if it is signficantly distinct from any real sample.

Objective: Privacy

Parameters:

Name	Type	Description	Default
`metric`	`str`	The metric to use when calculating distance between instances. For the available metrics, see the documentation of `sklearn.metrics.pairwise_distances`.	`"euclidean"`
`**kwargs`		Additional keyword arguments for compatibility.	`{}`

Notes

The authenticity metric is computed by checking if any fake sample is closer to a real sample than the real sample is to any other real sample.

References

Alaa et al., How Faithful Is Your Synthetic Data? Sample-Level Metrics for Evaluating and Auditing Generative Models. (2022) https://doi.org/10.48550/arXiv.2102.08921.

Hypersphere estimation code was adapted from: generative-evaluation-prdc, Reliable Fidelity and Diversity Metrics for Generative Models. https://github.com/clovaai/generative-evaluation-prdc

Examples:

>>> authenticity = Authenticity()
>>> real_features = np.random.rand(100, 100)
>>> fake_features = np.random.rand(100, 100)
>>> result: MetricResult = authenticity.compute(real_features, fake_features)

Source code in src/pymdma/general/measures/prdc.py

class Authenticity(FeatureMetric):
    """Authenticity Metric for assessing the authenticity of the generated
    samples. A synthetic sample is considered authentic if it is signficantly
    distinct from any real sample.

    **Objective**: Privacy

    Parameters
    ----------
    metric : str, optional, default="euclidean"
        The metric to use when calculating distance between instances.
        For the available metrics, see the documentation of `sklearn.metrics.pairwise_distances`.
    **kwargs
        Additional keyword arguments for compatibility.

    Notes
    -----
    The authenticity metric is computed by checking if any fake sample is closer to a real sample than the real sample is to any other real sample.

    References
    ----------
    Alaa et al., How Faithful Is Your Synthetic Data? Sample-Level Metrics for Evaluating and Auditing Generative Models. (2022)
    https://doi.org/10.48550/arXiv.2102.08921.

    Hypersphere estimation code was adapted from:
    generative-evaluation-prdc, Reliable Fidelity and Diversity Metrics for Generative Models.
    https://github.com/clovaai/generative-evaluation-prdc

    Examples
    --------
    >>> authenticity = Authenticity()
    >>> real_features = np.random.rand(100, 100)
    >>> fake_features = np.random.rand(100, 100)
    >>> result: MetricResult = authenticity.compute(real_features, fake_features)
    """

    reference_type = ReferenceType.DATASET
    evaluation_level = [EvaluationLevel.INSTANCE, EvaluationLevel.DATASET]
    metric_group = MetricGroup.PRIVACY

    higher_is_better: bool = True
    min_value: float = 0.0
    max_value: float = 1.0

    def __init__(
        self,
        metric: str = "euclidean",
        n_workers: int = 4,
        **kwargs,
    ):
        super().__init__(**kwargs)
        self.metric = metric
        self.n_workers = n_workers

    def compute(self, real_features: np.ndarray, fake_features: np.ndarray, **kwargs) -> MetricResult:
        """Compute the Authenticity metric.

        Parameters
        ----------
        real_features : np.ndarray
            Array of shape (n_samples, n_features) containing the real features.
        fake_features : np.ndarray
            Array of shape (n_samples, n_features) containing the fake features.

        Notes
        -----
        Intermediate computations can be stored in the `context` dictionary of the `kwargs` parameter.
        Usefull when calculating multiple metrics that share the same intermediate computations.

        Returns
        -------
        result: MetricResult
            Dataset-level and instance-level results for the authenticity metric
        """
        state = kwargs.get("context", {})

        if "real_fake_distances" not in state:
            state["real_fake_distances"] = compute_pairwise_distance(
                real_features,
                fake_features,
                metric=self.metric,
                n_workers=self.n_workers,
            )

        # compute distance to closest real samples
        state["real_closest_real_distances"] = compute_nearest_neighbour_distances(
            real_features,
            nearest_k=1,
            metric=self.metric,
            n_workers=self.n_workers,
        )

        # check if any fake sample is closer to Ri than Ri is to any other Rj
        authenticity = np.logical_or(
            state["real_fake_distances"] < np.expand_dims(state["real_closest_real_distances"], axis=1),
            np.isclose(state["real_fake_distances"], np.expand_dims(state["real_closest_real_distances"], axis=1)),
        )

        # mask of the values that are considered authentic in the fake dataset
        authenticity_mask = ~authenticity.any(axis=0)

        return MetricResult(
            dataset_level={"dtype": OutputsTypes.NUMERIC, "subtype": "float", "value": authenticity_mask.mean()},
            instance_level={
                "dtype": OutputsTypes.ARRAY,
                "subtype": "int",
                "value": authenticity_mask.astype(int).tolist(),
            },
        )

Synthesis Validation

Feature-based

Quality

`pymdma.tabular.measures.synthesis_val.ImprovedPrecision`

`pymdma.tabular.measures.synthesis_val.ImprovedRecall`

`pymdma.tabular.measures.synthesis_val.Density`

`pymdma.tabular.measures.synthesis_val.Coverage`

`pymdma.tabular.measures.synthesis_val.StatisticalSimScore`

`pymdma.tabular.measures.synthesis_val.StatisticalDivergenceScore`

`pymdma.tabular.measures.synthesis_val.CoherenceScore`

Privacy

`pymdma.tabular.measures.synthesis_val.Authenticity`

Data-based

Utility