lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Chaput <>
Subject Writing out the term count when merging
Date Mon, 19 Mar 2007 23:13:44 GMT
Hi all!

I'm reimplementing a very Lucene-like search library as a learning 
experience and I've run into a snag. Before I go deep code diving, I 
thought I'd ask here in case someone has the time to answer.

The term dictionary file includes the term count in a header. But when 
I'm merging segments, I can't know the collected number of UNIQUE terms 
in the merging segments before I've read them, so I can't write the 
header before I start merging the segments.

The ways I can see to do this are (a) to scan the term lists of the 
segments first and build the collected term list in memory before 
merging, (b) leave space in the file for the term count and go back and 
overwrite it later, or (c) something much more clever that Lucene does 
but I haven't figured out yet.

(b) is undesirable for me, because I'd like the option of using 
compressed streams in the backend, which must be written serially.

Anyway, if someone more familiar with the code could point me in the 
right direction, I'd appreciate it very much.



To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message