lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: postings without position information ?
Date Thu, 07 Feb 2008 20:46:47 GMT
Search the archive for flexible indexing.  There have been a number of  
discussions on things like this, although I don't know that your  
specific issue was ever covered, but it seems like it fits in that  
model.  I think there was even a patch at one point in time.

On Feb 7, 2008, at 1:43 PM, robert engels wrote:

> I think there are many uses of Lucene that would benefit from 'enum'  
> fields, aka categories.
> When classifying documents, they are often in one or more categories.
> Lucene could write these posting very efficiently using VINT and RLE  
> (run length encoding) if the positions information was not stored  
> (since it is not really useful in these typical cases).
> StartingDocNum|NumberOfDocuments...StartingDocNum|NumberOfDocuments  
> using a bit of the StartingDocNum to know if it was a series.
> When a lot of documents are in the same category, and they are added  
> as the same time, the document numbers would be nearly sequential,  
> allowing very efficient compression.
> Has anyone worked on this? Our previous custom IndexReaderWriter  
> supported it, and I was wondering if this has made it into the core.  
> I checked the docs/email and could not find anything.
> Thanks.
> Robert
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message