lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jibo John <jiboj...@mac.com>
Subject Re: ThreadedIndexWriter vs. IndexWriter
Date Mon, 03 Aug 2009 16:37:42 GMT
Mike,

Verified that I have the latest source code.
Here are the alg files and the checkindexer output.


----------------------------------------- indexwriter  
alg----------------------------------------------------------------

analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
doc.maker=org.apache.lucene.benchmark.byTask.feeds.LineDocMaker
directory=FSDirectory

doc.stored = true                                                    #A
docs.file=wikipedia.lines.txt
ram.flush.mb=50
compound=false
merge.factor=5
doc.add.log.step=1000
doc.term.vector=false
doc.term.vector.positions=false
doc.term.vector.offsets=false

{ "Rounds"                                                           #B
  ResetSystemErase
  { "BuildIndex"
   -CreateIndex()
   [ { "AddDocs" AddDoc > : 40000 ] :  
5                                    #C
   -CloseIndex()
  }
  NewRound
} : 1

RepSumByPrefRound BuildIndex                                         #D

-----------------------------------------threadedindexwriter alg  
----------------------------------------------------------------

analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
doc.maker=org.apache.lucene.benchmark.byTask.feeds.LineDocMaker
directory=FSDirectory

doc.stored = true                                                    #A
docs.file=wikipedia.lines.txt
ram.flush.mb=50
compound=false
merge.factor=5
doc.add.log.step=1000
doc.term.vector=false
doc.term.vector.positions=false
doc.term.vector.offsets=false
writer.num.threads=15
writer.max.thread.queue.size=75
work.dir=work_t


{ "Rounds"                                                           #B
  ResetSystemErase
  { "BuildIndex"
   -CreateThreadedIndex()
    { "AddDocs" AddDoc > : 200000
   -CloseIndex()
  }
  NewRound
} : 1

RepSumByPrefRound BuildIndex                                         #D


-----------------------------------------------threadedindexwriter  
checkindex ----------------------------------------------------------


$ java -classpath /Users/jibo/Desktop/iwork/lucene/java/trunk/build/ 
lucene-core-2.9-dev.jar org.apache.lucene.index.CheckIndex /Users/jibo/ 
Desktop/iwork/lucene/java/trunk/contrib/benchmark/work_t/index

NOTE: testing will be more thorough if you run java with '- 
ea:org.apache.lucene...', so assertions are enabled

Opening index @ /Users/jibo/Desktop/iwork/lucene/java/trunk/contrib/ 
benchmark/work_t/index

Segments file=segments_3 numSegments=1 version=FORMAT_DIAGNOSTICS  
[Lucene 2.9]
  1 of 1: name=_p docCount=199941
    compound=true
    hasProx=true
    numFiles=3
    size (MB)=317.1
    diagnostics = {java.version=1.5.0_19, lucene.version=2.9-dev  
779767M - 2009-05-28 17:02:17, os=Mac OS X, os.arch=i386,  
optimize=true, mergeDocStores=false, java.vendor=Apple Inc.,  
os.version=10.5.7, source=merge, mergeFactor=5}
    docStoreOffset=0
    docStoreSegment=_0
    docStoreIsCompoundFile=false
    no deletions
    test: open reader.........OK
    test: fields, norms.......OK [4 fields]
    test: terms, freq, prox...OK [1269552 terms; 67887116 terms/docs  
pairs; 133241176 tokens]
    test: stored fields.......OK [199941 total field count; avg 1  
fields per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/ 
freq vector fields per doc]

No problems were detected with this index.

------------------------------------------indexwriter checkindex  
---------------------------------------------------------------

$ java -classpath /Users/jibo/Desktop/iwork/lucene/java/trunk/build/ 
lucene-core-2.9-dev.jar org.apache.lucene.index.CheckIndex /Users/jibo/ 
Desktop/iwork/lucene/java/trunk/contrib/benchmark/work/index

NOTE: testing will be more thorough if you run java with '- 
ea:org.apache.lucene...', so assertions are enabled

Opening index @ /Users/jibo/Desktop/iwork/lucene/java/trunk/contrib/ 
benchmark/work/index

Segments file=segments_a numSegments=1 version=FORMAT_DIAGNOSTICS  
[Lucene 2.9]
  1 of 1: name=_18 docCount=200000
    compound=true
    hasProx=true
    numFiles=1
    size (MB)=427.445
    diagnostics = {java.version=1.5.0_19, lucene.version=2.9-dev  
779767M - 2009-05-28 17:02:17, os=Mac OS X, os.arch=i386,  
optimize=true, mergeDocStores=true, java.vendor=Apple Inc.,  
os.version=10.5.7, source=merge, mergeFactor=4}
    no deletions
    test: open reader.........OK
    test: fields, norms.......OK [4 fields]
    test: terms, freq, prox...OK [3512343 terms; 80020204 terms/docs  
pairs; 163219760 tokens]
    test: stored fields.......OK [200000 total field count; avg 1  
fields per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/ 
freq vector fields per doc]

No problems were detected with this index.

---------------------------------------------------------------------------------------------------------




Thanks,
-Jibo







On Aug 1, 2009, at 2:08 AM, Michael McCandless wrote:

> (Please note that ThreadedIndexWriter is source code available with
> the upcoming revision to Lucene in Action.)
>
> Phil, is it possible you are using an older version of the book's
> source code?  In particular, can you check whether your version of
> ThreadedIndexWriter.java has this:
>
>  public void close(boolean doWait) throws CorruptIndexException,  
> IOException {
>    finish();
>    super.close(doWait);
>  }
>
> (I vaguely remember that being missing from earlier releases, which
> could explain what you're seeing).  If you are missing that, can you
> download the current code from http://www.manning.com/hatcher3 and try
> again?
>
> If that's not the problem... can you post the benchmark alg you are
> using in each case?
>
> Mike
>
> On Fri, Jul 31, 2009 at 8:26 PM, Jibo John<jibojohn@mac.com> wrote:
>> Hi Phil,
>>
>> It's 5 threads for IndexWriter.
>>
>> For ThreadedIndexWriter, I used:
>>
>> writer.num.threads=16
>> writer.max.thread.queue.size=80
>>
>> Thanks,
>> -Jibo
>>
>> On Jul 31, 2009, at 5:01 PM, Phil Whelan wrote:
>>
>>> Hi Jibo,
>>>
>>> Your mergeFactor is different, and the resulting numFiles (segment
>>> files) is different. Maybe each thread is responsible for a segment
>>> file. Just curious - do you have 3 threads?
>>>
>>> Phil
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message