ICTIR '15 Proceedings of the 2015 International Conference on The Theory of Information Retrieval
We approach the problem of retrievability from an analytical perspective, starting with modeling conjunctive and disjunctive queries in a boolean model. We show that this represents an upper bound on retrievability for all other best match algorithms. We follow this with an observation of imbalance in the distribution of retrievability, using the Gini coefficient. Simulation-based experiments show the behavior of the Gini coefficient for retrievability under different types and lengths of queries, as well as different assumptions about the document length distribution in a collection.
Computational Science and Engineering