lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kevin A. Burton" <bur...@newsmonster.org>
Subject Re: Most efficient way to index 14M documents (out of memory/file handles)
Date Wed, 07 Jul 2004 18:04:02 GMT
Doug Cutting wrote:

> Julien,
>
> Thanks for the excellent explanation.
>
> I think this thread points to a documentation problem. We should 
> improve the javadoc for these parameters to make it easier for folks to
>
> In particular, the javadoc for mergeFactor should mention that very 
> large values (>100) are not recommended, since they can run into file 
> handle limitations with FSDirectory. The maximum number of open files 
> while merging is around mergeFactor * (5 + number of indexed fields). 
> Perhaps mergeFactor should be tagged an "Expert" parameter to 
> discourage folks playing with it, as it is such a common source of 
> problems.
>
> The javadoc should instead encourage using minMergeDocs to increase 
> indexing speed by using more memory. This parameter is unfortunately 
> poorly named. It should really be called something like maxBufferedDocs.

I'd like to see something like this done...

BTW.. I'm willing to add it to the wiki in the interim.

This conversation has happened a few times now...

Kevin

-- 

Please reply using PGP.

    http://peerfear.org/pubkey.asc    
    
    NewsMonster - http://www.newsmonster.org/
    
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
       AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
  IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message