lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: postings lists deduplication
Date Thu, 06 Jun 2013 10:24:14 GMT
Neat idea!

Would this idea allow a single term to point to (the union of) N other
posting lists?  It seems like that's necessary e.g. to handle the
exact/inexact case.

And then, to produce the Docs/AndPositionsEnum you'd need to do the
merge sort across those N posting lists?

Such a thing might also be do-able as runtime only wrapper around the
postings API (FieldsProducer), if you could at runtime do the reverse
expansion (e.g. stem -> all of its surface forms).

Mike McCandless

On Thu, Jun 6, 2013 at 3:51 AM, Dmitry Kan <> wrote:

> Robert Muir and I have discussed what Robert eventually named "postings
> lists deduplication" at bbuzz 2013 conference in Berlin.
> The idea is to allow multiple terms to point to the same postings list to
> save space.
> The application / impact of this is positive for synonyms, exact / inexact
> terms, leading wildcard support via storing reversed term etc.
> At the moment, when supporting exact (unstemmed) and inexact (stemmed)
> searches, we store both unstemmed and stemmed variant of a word form and
> that leads to index bloating. For example, we had to remove the leading
> wildcard support via reversing a token on index and query time because of
> the same index size considerations.
> Would you like a jira for this?
> Thanks,
> Dmitry Kan

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message