Skip to content

Implementing Metrics

Before adding any metrics, please consult the contributing guidelines and the developer documentation.

Important: We only accept metrics that are published in peer-reviewed journals. Every metric metric in the library must have a valid reference to a published paper or be widely known in the field.

Base Metric Class

pymdma.common.definitions.Metric

Source code in src/pymdma/common/definitions.py
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
class Metric(ABC):
    # evaluation params
    evaluation_level: EvaluationLevel = EvaluationLevel.DATASET
    metric_group: MetricGroup
    reference_type: ReferenceType = ReferenceType.NONE

    # metric specific
    higher_is_better: Optional[bool] = None
    min_value: Optional[float] = None
    max_value: Optional[float] = None

    def __init__(self, **kwargs) -> None:
        super().__init__()

    @abstractmethod
    def compute(self, *args, **kwargs) -> MetricResult:
        pass

Every metric inherits from the Metric abstract class in pymdma.common.definitions. The new metric must inherit from this class and implement the following methods:

  • __init__(): Initialize the metric with the provided keyword arguments. The keyword arguments must be documented in the class docstring.
  • compute(): Compute the metric value. This method consumes raw data and must return a MetricResult object. Make sure that the inputs and the outputs are well documented and are consistent with the data types used in similar metrics.

Class Attributes

Every metric is categorized with specific attributes that should be overriden in the metric class:

  • reference_type: Indicates wether the compute method expects a reference and a target dataset or not.
  • evaluation_level: Can either be a list of EvaluationLevel or a single EvaluationLevel. Indicates wether the metric is dataset-wise or instance-wise.
  • metric_group: Indicates the metric category (consult the hierarchy diagram in the homepage).
  • higher_is_better: If a higher metric result is better, set this to True.
  • min_value: Minimum possible value for the metric.
  • max_value: Highest possible value for the metric.

Metric Documentation

In addition to a clear description, the class documentation must include the following sections:

  • Objective: Describe the objective of the metric, e.g., "Similarity", "Authenticity", etc.
  • Parameters: List the parameters of the metric __init__ method, e.g., fs: int, optional, default=2048
  • References: List relevant references for the metric, such as journal articles, conference papers, or books. If the metric is published in a peer-reviewed journal, please provide the DOI link.
  • Example: Provide a simple example of how to use the metric in code, e.g., metric = YourMetric(...).

Following is an example of a new metric class:

from pymdma.common.definitions import Metric
from pymdma.common.output import MetricResult

from pymdma.constants import EvaluationLevel, MetricGroup, OutputsTypes, ReferenceType


class NewMetric(Metric):
    """Metric description

    **Objective**: An objective

    Parameters
    ----------
    **kwargs : dict, optional
        Additional keyword arguments for compatibility.

    References
    ----------
    Author et al. Paper title (year). <link to paper>.

    Examples
    --------
    >>> new_metric = NewMetric()
    >>> data = np.random.rand(100, 100)
    >>> result: MetricResult = new_metric.compute(data)
    """

    reference_type = ReferenceType.NONE
    evaluation_level = EvaluationLevel.INSTANCE
    metric_group = MetricGroup.QUALITY

    higher_is_better: bool = False
    min_value: float = 0.0
    max_value: float = 1.0

    def __init__(
        self,
        **kwargs,
    ):
        super().__init__(**kwargs)

    def compute(self, data, **kwargs) -> MetricResult:
        """Computes colorfulness level of list of images.

        Parameters
        ----------
        data : type
            description

        Returns
        -------
        result: MetricResult
            small description
        """
        # Delete one of the level results if the metric only has a single evaluation level
        return MetricResult(
            dataset_level={
                "dtype": OutputsTypes.NUMBER,
                "subtype": "float",
                "value": 0.0,
            },
            instance_level={
                "dtype": OutputsTypes.ARRAY,
                "subtype": "float",
                "value": scores,
            },
        )

Metric Result Class

The MetricResult class is used to store and validate the metric outputs. It is mostly based on pydantic validation models. The class is defined in pymdma.common.output and should be instanciated and returned in the compute method of the metric class.

pymdma.common.output.MetricResult

Source code in src/pymdma/common/output.py
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
class MetricResult(BaseModel):
    dataset_level: Optional[EvalLevelOutput] = Field(None, title="Dataset Level", description="Dataset level output")
    instance_level: Optional[EvalLevelOutput] = Field(None, title="Instance Level", description="Instance level output")
    errors: Optional[List[str]] = Field(
        None,
        title="Errors",
        description="Errors during the metric computation",
    )

    @model_validator(mode="after")
    def check_levels(self):
        if self.dataset_level is None and self.instance_level is None:
            raise ValueError("At least one of the output levels should be provided")
        return self

    @property
    def value(self) -> Tuple[Optional[EvalLevelOutput], Optional[EvalLevelOutput]]:
        """Return the dataset-level and instance-level tuple of the metric
        result.

        Returns
        -------
        Tuple[Optional[EvalLevelOutput], Optional[EvalLevelOutput]]
            The dataset-level and instance-level output of the metric result.
        """
        return self.dataset_level.value if self.dataset_level else None, (
            self.instance_level.value if self.instance_level else None
        )

    @property
    def stats(self) -> Tuple[Optional[Dict[str, Any]], Optional[Dict[str, Any]]]:
        """Return the dataset-level and instance-level statistics of the metric
        result.

        Returns
        -------
        Tuple[Optional[Dict[str, Any]], Optional[Dict[str, Any]]]
            The dataset-level and instance-level statistics of the metric result.
        """
        return self.dataset_level.stats if self.dataset_level else None, (
            self.instance_level.stats if self.instance_level else None
        )

    @property
    def verbose_value(self) -> Tuple[Dict[str, Any], Dict[str, Any]]:
        """Return the dataset-level and instance-level verbose value
        (dictionary) of the metric result.

        Returns
        -------
        Tuple[Dict[str, Any], Dict[str, Any]]
            The dataset-level and instance-level verbose value of the metric result.
        """
        return self.dataset_level.model_dump() if self.dataset_level else None, (
            self.instance_level.model_dump() if self.instance_level else None
        )

    # TODO validate plot_params

    def plot(self, title: str = "", ax: Optional[plt.Axes] = None, **plot_kwargs):
        """Plot the metric results.

        Parameters
        ----------
        title : str, optional
            The title of the plot, by default "".
        ax : Optional[plt.Axes], optional
            The axes to plot the results on, by default None.
        **plot_kwargs
            Additional keyword arguments passed to the plotting function.
        """
        plot_from_results(
            title=title,
            instance_level=self.instance_level,
            dataset_level=self.dataset_level,
            ax=ax,
            **plot_kwargs,
        )

It has the following attributes:

  • dataset_level: Result of the metric for the entire dataset (if available).
  • instance_level: Result of the metric for each instance (if available).
  • errors: A list of errors that occurred during the metric computation (can ignore).

pymdma.common.output.EvalLevelOutput

Source code in src/pymdma/common/output.py
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
class EvalLevelOutput(BaseModel):
    dtype: OutputsTypes = Field(..., title="Data Type", description="Internal type representation of the output value")
    subtype: Literal["int", "float", "str"] = Field(..., title="Subtype", description="Subtype of the output value")
    value: SUPPORTED_OUTPUTS = Field(..., title="Value", description="Metric value")
    stats: Optional[Dict[str, SUPPORTED_OUTPUTS]] = Field(
        None,
        title="Stats",
        description="Additional statistics of the metric value",
    )
    plot_params: Optional[PlotParams] = Field(
        None,
        title="Plot Params",
        description="Plotting parameters for the metric",
    )

    @model_validator(mode="after")
    def _validate_plot_level(self):
        if self.plot_params is None:
            return self

        plot_params = self.plot_params
        if plot_params.x_key:
            assert OutputsTypes(self.dtype) in {
                OutputsTypes.KEY_ARRAY,
                OutputsTypes.KEY_VAL,
            }, "X key can only be provided when the output type is KEY_ARRAY or KEY_VAL"
            assert plot_params.x_key in self.value, "The provided X key is not present in the output value"
        if plot_params.y_key:
            assert OutputsTypes(self.dtype) in {
                OutputsTypes.KEY_VAL,
                OutputsTypes.KEY_ARRAY,
            }, "Y key can only be provided when the output type is KEY_VAL or ARRAY"
            assert plot_params.y_key in self.value, "The provided Y key is not present in the output value"
        return self

Each level attribute is a pydantic model with the following fields:

  • dtype: The data type of the metric result.
  • subtype: The data subtype of the metric result.
  • value: The metric result value.
  • stats: Additional statistics of the metric result.
  • plot_params: Plotting parameters for the metric.

Contributing your metric

Remember to read the contributing and the developer documentation. For any new metric, you must adhere to the metric hierarchy diagram in the homepage. If the metric does not fit the hierarchy, please raise an issue in the GitHub repository.

Once the hierarchy is clear, you can create a new metric class. Start by creating a new python script under src/pymdma/<data_modality>/measures/<validation_domain>/<metric_category>/<your_metric>.py. And then follow the instructions explained bellow.

  1. Define a new metric class that inherits from the Metric abstract class in pymdma.common.definitions. The new metric must inherit from this class and implement the above mentioned methods.
  2. Avoid introducing any third party dependencies.
  3. Functions used for intermediate computation should be included in this script.
  4. At the end of the file define the __all__ variable to export the metric class.
  5. Add a metric import to the __init__.py file in the src/pymdma/<data_modality>/measures/<validation_domain> module and add the name to the __all__ variable of the same file.
  6. Develop at least one test case for the metric using the pytest framework. Create a test script with the name test_<your_metric>.py under the tests/ folder and write at least one method starting with test_ that uses the metric. Try to cover edge cases when possible.

Once you're done, feel free to open a pull request in the GitHub repository.\ If you have any questions or run into any issues along the way, just leave a comment in the PR — we're happy to help!