lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kevin A. Burton" <bur...@newsmonster.org>
Subject Re: Way to repair an index broking during 1/2 optimize?
Date Fri, 09 Jul 2004 01:56:50 GMT
Doug Cutting wrote:

>
> Something sounds very wrong for there to be that many files.
>
> The maximum number of files should be around:
>
> (7 + numIndexedFields) * (mergeFactor-1) * 
> (log_base_mergeFactor(numDocs/minMergeDocs))
>
> With 14M documents, log_10(14M/1000) is 4, which gives, for you:
>
> (7 + numIndexedFields) * 36 = 230k
> 7*36 + numIndexedFields*36 = 230k
> numIndexedFields = (230k - 7*36) / 36 =~ 6k
>
> So you'd have to have around 6k unique field names to get 230k files. 
> Or something else must be wrong. Are you running on win32, where file 
> deletion can be difficult?
>
> With the typical handful of fields, one should never see more than 
> hundreds of files.
>
We only have 13 fields... Though to be honest I'm worried that even if I 
COULD do the optimize that it would run out of file handles.

This is very strange...

I'm going to increase minMergeDocs to 10000 and then run the full 
converstion on one box and then try to do an optimize (of the corrupt) 
another box. See which one finishes first.

I assume the speed of optimize() can be increased the same way that 
indexing is increased...

Kevin

-- 

Please reply using PGP.

    http://peerfear.org/pubkey.asc    
    
    NewsMonster - http://www.newsmonster.org/
    
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
       AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
  IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message