lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From robert engels <>
Subject Re: Writing out the term count when merging
Date Mon, 19 Mar 2007 23:29:47 GMT
but a better solution, since you probably need a indexed file into  
the terms file, you might not even need the term count, since you  
should read the indexed file into memory anyway (read every 16  
entries, etc.) - at which point you will know the number of terms in  
the file.

On Mar 19, 2007, at 6:13 PM, Matt Chaput wrote:

> Hi all!
> I'm reimplementing a very Lucene-like search library as a learning  
> experience and I've run into a snag. Before I go deep code diving,  
> I thought I'd ask here in case someone has the time to answer.
> The term dictionary file includes the term count in a header. But  
> when I'm merging segments, I can't know the collected number of  
> UNIQUE terms in the merging segments before I've read them, so I  
> can't write the header before I start merging the segments.
> The ways I can see to do this are (a) to scan the term lists of the  
> segments first and build the collected term list in memory before  
> merging, (b) leave space in the file for the term count and go back  
> and overwrite it later, or (c) something much more clever that  
> Lucene does but I haven't figured out yet.
> (b) is undesirable for me, because I'd like the option of using  
> compressed streams in the backend, which must be written serially.
> Anyway, if someone more familiar with the code could point me in  
> the right direction, I'd appreciate it very much.
> Thanks!
> Matt
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message