SIGIR '15 Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval
For many tasks in evaluation campaigns, especially those modeling narrow domain-specific challenges, lack of participation leads to a potential pooling bias due to the scarce number of pooled runs. It is well known that the reliability of a test collection is proportional to the number of topics and relevance assessments provided for each topic, but also to same extent to the diversity in participation in the challenges. Hence, in this paper we present a new perspective in reducing the pool bias by studying the effect of merging an unpooled run with the pooled runs. We also introduce an indicator used by the bias correction method to decide whether the correction needs to be applied or not. This indicator gives strong clues about the potential of a "good" run tested on an "unfriendly" test collection (i.e. a collection where the pool was contributed to by runs very different from the one at hand). We demonstrate the correctness of our method on a set of fifteen test collections from the Text REtrieval Conference (TREC). We observe a reduction in system ranking error and absolute score difference error.
Computational Science and Engineering