lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex vB <>
Subject Re: Implementing indexing of Versioned Document Collections
Date Tue, 16 Nov 2010 20:39:37 GMT

Hi again,

my Payloads are working fine as I figured out now (haven't seen the
nextPosition method). I really have problems with adding the bitvectors.
Currently I am creating them during tokenization. Therefore, as already
mentioned, they are only completely created when all fields are tokenized
because I add every new term occurence into HashMap and create/update the
linked bitvector during this analysis process. I read in another post that
changing or updating already set payloads isn't possible. Furthermore I need
to store payload only ONCE for a term and not on every term position. For
example on the wiki article for April I would have around 5000 term
occurrences for the term "April"! This would save a lot of memory.

1) Is it possible to pre analyze fields? Maybe analyzing twice. First time
for getting the bitvectors (without writing them!) and second time for
normal index writing with bitvector payloads.
2) Alternatively I could still add the bitvectors during tokenization if I
would be able to set the current term in my custom Filter (extends
TokenFilter). In my HashMap I have pairs of <Term, BitVector> and I could
iterate over all term keys. Is it possible to manually set the current term
and the corresponding payload? I tried something like this after all fields
and streams have been tokenized (Without success):

for (Map.Entry<String, BitSet> e : map.entrySet()) {
	key = e.getKey();
	value = e.getValue();

	bitvectorPalyoad = new Payload(toByteArray(value)); 

3) Can I use payloads without term positions? 

If my questions are unclear please tell me! :)

Best regards

View this message in context:
Sent from the Lucene - Java Users mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message