Addressing Different Evaluation Environments for Information Retrieval through Pivot Systems

Gabriela Nicole González Sáez
Lorraine Goeuriot
Philippe Mulhem
DOI
10.24348/coria.2021.long_6
Résumé

Classical evaluations of Information Retrieval systems, under the Cranfield Paradigm, compare several systems in one evaluation environment composed by its settings as the corpus, topics, assessments and evaluation measures. This paper proposes a framework able to handle the comparison of systems across several evaluation environments. To achieve this goal, we use pivot systems, that allow an indirect comparison of systems across evaluation environments by computing Result Deltas, i.e. the differences, between their evaluation measures values. We experiment the feasibility of our proposal, according to different features presented on the pivots. We create altered environments that differ from their topic sets using the 2018 and 2020 CLEF eHealth evaluation campaigns. We explore the behaviour of the metrics and pivots measuring the correlation between the result deltas, and the ranking of systems though the pivots compared to the official ranking of the systems. Both studied elements are impacted by the metric and the pivot system selected. We show that some pivots/metrics pairs achieve high correlation values across the altered environments, with a ranking of systems similar to the official ranking.