lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From robert engels <reng...@ix.netcom.com>
Subject postings without position information ?
Date Thu, 07 Feb 2008 18:43:33 GMT
I think there are many uses of Lucene that would benefit from 'enum'  
fields, aka categories.

When classifying documents, they are often in one or more categories.

Lucene could write these posting very efficiently using VINT and RLE  
(run length encoding) if the positions information was not stored  
(since it is not really useful in these typical cases).

StartingDocNum|NumberOfDocuments...StartingDocNum|NumberOfDocuments  
using a bit of the StartingDocNum to know if it was a series.

When a lot of documents are in the same category, and they are added  
as the same time, the document numbers would be nearly sequential,  
allowing very efficient compression.

Has anyone worked on this? Our previous custom IndexReaderWriter  
supported it, and I was wondering if this has made it into the core.  
I checked the docs/email and could not find anything.

Thanks.

Robert





---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message