lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Renaud Delbru <>
Subject Modification of positional information encoding
Date Mon, 13 Oct 2008 13:53:26 GMT

We are trying to modify the positional encoding of a term occurrence for 
experimentation purposes. One solution we adopt is to use payloads to 
sotre our own positional information encoding, but with this solution, 
it becomes difficult to measure the increase or decrease of index size. 
It is why we would like to directly change the positional encoding.

I have seen that Michael McCandless recently refactored the 
DocumentsWriter into a flexible indexer chain (see LUCENE-1301). By 
analysing the code, I have noticed that only two classes should be 
modified when writing a document:

When adding a document with IndexWriter.addDocument
- FieldInvertState (increment and store positions)
- FreqProxTermsWriterPerField.writeProx

When flushing documents:
- FreqProxTermsWriterPerField.appendPostings

I have noticed that only one class should be modified when reading a 
- SegmentTermPositions.nextPositions, and
- SegmentTermPositions.readDeltaPositions

Could a member of the Lucene team approve my modifications ? Do I forget 
to modify some classes ?

Another question, since the lucene core classes are kind of close, what 
is the best way to implement these modifications ? Make a branch of 
lucene, and add my new classes to the lucene package 
org.apache.lucene.index ? Or do a more elegant solution is possible ?

Thanks in advance,
Renaud Delbru

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message