lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: [jira] Commented: (LUCENE-1994) EnwikiConentSource does not work with parallel tasks
Date Mon, 19 Oct 2009 16:15:21 GMT
I don't think some of the stat tracking works right with parallel either
- to get the total time, its adding up when each thread finished - eg if
thread one finishes at second 30 and thread2 at second 32, its saying it
took 62 seconds total.

   [java] ------------> algorithm:
     [java] Seq {
     [java]     Rounds_2 {
     [java]         ResetSystemErase
     [java]         Populate {
     [java]             CreateIndex
     [java]             Par_8 [
     [java]                 MAddDocs_2500 {
     [java]                     AddDoc
     [java]                 } * 2500
     [java]             ] * 8
     [java]             Optimize
     [java]             CommitIndex
     [java]             CloseIndex
     [java]         }
     [java]         RepSumByPref MAddDocs
     [java]         NewRound
     [java]     } * 2
     [java]     RepSumByNameRound
     [java]     RepSumByName
     [java]     RepSumByPrefRound MAddDocs
     [java] }
     [java] ------------> starting task: Seq
     [java] ------------> starting task: Rounds_2
     [java] ------------> starting task: ResetSystemErase
     [java] ------------> starting task: Populate
     [java] 55.84 sec --> Thread-2 added 2000 docs
     [java] 60.94 sec --> Thread-6 added 2000 docs
     [java] 74.82 sec --> Thread-0 added 2000 docs
     [java] 77.48 sec --> Thread-3 added 2000 docs
     [java] 81.21 sec --> Thread-1 added 2000 docs
     [java] 90.72 sec --> Thread-5 added 2000 docs
     [java] 96.46 sec --> Thread-7 added 2000 docs
     [java] 97.17 sec --> Thread-4 added 2000 docs
     [java] ------------> Report Sum By Prefix (MAddDocs) (1 about 8 out
of 20016)
     [java] Operation     round mrg flush cmpnd   runCnt  
recsPerRun        rec/s  elapsedSec    avgUsedMem    avgTotalMem
     [java] MAddDocs_2500     0  20 48.00 false        8        
2500        28.01      713.99   135,359,120    273,850,368

Shai Erera (JIRA) wrote:
>     [ https://issues.apache.org/jira/browse/LUCENE-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12767343#action_12767343
] 
>
> Shai Erera commented on LUCENE-1994:
> ------------------------------------
>
> Yes I agree (to both comments). Basically for a ContentSource to be supported by parallel
tasks, its getNextDocData should be made synchronized, or it finds another way to sync on
the important stuff (for example TrecContentSource).
>
>   
>> EnwikiConentSource does not work with parallel tasks
>> ----------------------------------------------------
>>
>>                 Key: LUCENE-1994
>>                 URL: https://issues.apache.org/jira/browse/LUCENE-1994
>>             Project: Lucene - Java
>>          Issue Type: Bug
>>          Components: contrib/benchmark
>>    Affects Versions: 2.9
>>            Reporter: Mark Miller
>>            Priority: Minor
>>
>>     
>
>
>   


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message