lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devon H. O'Dell" <devon.od...@gmail.com>
Subject Re: Thread locking while merging (ConcurrentMergeScheduler issue?)
Date Wed, 03 Aug 2011 23:31:32 GMT
For what it's worth, I've seen this happen too (using the stock Lucene
3.3 Java APIs), but it requires me to index many millions of
documents, and doesn't start being a really big problem until the
indexes get to be closer to 250GB in size. When they reach around 1TB,
it will take around an hour for the merge to complete (which is
frustrating). Similar to Pierre-Henri, I see virtually no disk I/O
when it happens and the system in question is one of the Amazon EC2
"Huge" instances (so, something like 8 cores and 32GB RAM) and disk
I/O during indexing pushes around 100MB/s.

If it would be useful to see additional reports / information from
this scenario, I'm sure I can get something put together.

--dho

2011/8/3 Pierre-Henri Toussaint <pierrehenri.toussaint@gmail.com>:
> OK so the problem definitely comes from the slow merging.
> I slightly increased the number merge count and thread to avoid the problem
> described previously. But as expected, it just delayed it !
>
> results : 75 minutes to index the 33GB xml file, and 150 minutes to finish
> the merge after indexer.close.
> See uploaded  http://lucene.472066.n3.nabble.com/file/n3223874/slowmerge log
> file  containing: logs (timems:numberofdocsindexed/current_title) +
> infoStream + random threaddump.
> You can spot "indexer.close (no optimize)" (line 5721) for indexing
> completion and the beginning of merging nightmare.
>
> *conf :
> */conf.setRAMBufferSizeMB(512);
> ConcurrentMergeScheduler mergeScheduler = new ConcurrentMergeScheduler();
> mergeScheduler.setMaxMergeCount(6);
> mergeScheduler.setMaxThreadCount(4);
> conf.setMergeScheduler(mergeScheduler);
> writer = new ThreadedIndexWriter(directory, analyzer, true, 2, 5, conf);/
>>>everything else default. no optimize called
> *documents :
> */pageDocument.add(new Field("title", page.getTitle(), Field.Store.YES,
> Field.Index.NO));
> pageDocument.add(new Field("text", page.getText(), Field.Store.NO,
> Field.Index.ANALYZED));
> if (page.getContributorUserName() != null)
> pageDocument.add(new Field("contributorUserName",
> page.getContributorUserName(), Field.Store.NO, Field.Index.ANALYZED));/
> *infoStream info :*
> setInfoStream
> deletionPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy@2dafae45
> dir=org.apache.lucene.store.NIOFSDirectory@/Users/ptoussaint/Documents/workspace/wikisearch/index2
> lockFactory=org.apache.lucene.store.NativeFSLockFactory@39dd3812
> index=
> version=4.0-SNAPSHOT
> matchVersion=LUCENE_40
> analyzer=org.pache.soundcloud.wikisearch.Indexer$WikiAnalyzer
> delPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy
> commit=null
> openMode=CREATE_OR_APPEND
> similarityProvider=org.apache.lucene.search.DefaultSimilarityProvider
> termIndexInterval=32
> mergeScheduler=org.apache.lucene.index.ConcurrentMergeScheduler
> default WRITE_LOCK_TIMEOUT=1000
> writeLockTimeout=1000
> maxBufferedDeleteTerms=-1
> ramBufferSizeMB=512.0
> maxBufferedDocs=-1
> mergedSegmentWarmer=null
> codecProvider=org.apache.lucene.index.codecs.CoreCodecProvider@6a8c436b
> mergePolicy=[TieredMergePolicy: maxMergeAtOnce=10,
> maxMergeAtOnceExplicit=30, maxMergedSegmentMB=5120.0, floorSegmentMB=2.0,
> expungeDeletesPctAllowed=10.0, segmentsPerTier=10.0, useCompoundFile=true,
> noCFSRatio=0.1
> indexerThreadPool=org.apache.lucene.index.ThreadAffinityDocumentsWriterThreadPool@1e9e5c73
> readerPooling=false
> readerTermsIndexDivisor=1
> flushPolicy=org.apache.lucene.index.FlushByRamOrCountsPolicy@2ec791b9
> perThreadHardLimitMB=1945
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Thread-locking-while-merging-ConcurrentMergeScheduler-issue-tp3222427p3223874.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message