lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jibo John <jiboj...@mac.com>
Subject Re: ThreadedIndexWriter vs. IndexWriter
Date Fri, 31 Jul 2009 23:22:15 GMT
Mike,

Here you go:


IndexWriter:
----------------
$ java -classpath /Users/jibo/Desktop/iwork/lucene/java/trunk/build/ 
lucene-core-2.9-dev.jar org.apache.lucene.index.CheckIndex /Users/jibo/ 
Desktop/iwork/lucene/java/trunk/contrib/benchmark/work/index

NOTE: testing will be more thorough if you run java with '- 
ea:org.apache.lucene...', so assertions are enabled

Opening index @ /Users/jibo/Desktop/iwork/lucene/java/trunk/contrib/ 
benchmark/work/index

Segments file=segments_a numSegments=1 version=FORMAT_DIAGNOSTICS  
[Lucene 2.9]
  1 of 1: name=_18 docCount=200000
    compound=true
    hasProx=true
    numFiles=1
    size (MB)=427.448
    diagnostics = {java.version=1.5.0_19, lucene.version=2.9-dev  
779767M - 2009-05-28 17:02:17, os=Mac OS X, os.arch=i386,  
optimize=true, mergeDocStores=true, java.vendor=Apple Inc.,  
os.version=10.5.7, source=merge, mergeFactor=4}
    no deletions
    test: open reader.........OK
    test: fields, norms.......OK [4 fields]
    test: terms, freq, prox...OK [3512343 terms; 80020204 terms/docs  
pairs; 163219760 tokens]
    test: stored fields.......OK [200000 total field count; avg 1  
fields per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/ 
freq vector fields per doc]

No problems were detected with this index.


ThreadedIndexWriter:
-----------------------------

$ java -classpath /Users/jibo/Desktop/iwork/lucene/java/trunk/build/ 
lucene-core-2.9-dev.jar org.apache.lucene.index.CheckIndex /Users/jibo/ 
Desktop/iwork/lucene/java/trunk/contrib/benchmark/work/index

NOTE: testing will be more thorough if you run java with '- 
ea:org.apache.lucene...', so assertions are enabled

Opening index @ /Users/jibo/Desktop/iwork/lucene/java/trunk/contrib/ 
benchmark/work/index

Segments file=segments_3 numSegments=1 version=FORMAT_DIAGNOSTICS  
[Lucene 2.9]
  1 of 1: name=_q docCount=199970
    compound=true
    hasProx=true
    numFiles=3
    size (MB)=319.107
    diagnostics = {java.version=1.5.0_19, lucene.version=2.9-dev  
779767M - 2009-05-28 17:02:17, os=Mac OS X, os.arch=i386,  
optimize=true, mergeDocStores=false, java.vendor=Apple Inc.,  
os.version=10.5.7, source=merge, mergeFactor=6}
    docStoreOffset=0
    docStoreSegment=_0
    docStoreIsCompoundFile=false
    no deletions
    test: open reader.........OK
    test: fields, norms.......OK [4 fields]
    test: terms, freq, prox...OK [1227086 terms; 69244121 terms/docs  
pairs; 134390948 tokens]
    test: stored fields.......OK [199970 total field count; avg 1  
fields per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/ 
freq vector fields per doc]

No problems were detected with this index.


$



On Jul 31, 2009, at 2:52 PM, Michael McCandless wrote:

> Hmmm... can you run CheckIndex on both indexes and post the results?
>
>  java org.apache.lucene.index.CheckIndex /path/to/index
>
> Mike
>
> On Fri, Jul 31, 2009 at 2:38 PM, Jibo John<jibojohn@mac.com> wrote:
>> Number of docs are the same in the index for both the cases  
>> (200,000).
>> I haven't altered the benchmark/ code, but, used a profiler to  
>> verify that
>>  Benchmark main thread is closed only after all other  threads are  
>> closed.
>>
>> Thanks,
>> -Jibo
>>
>>
>> On Jul 31, 2009, at 2:34 AM, Michael McCandless wrote:
>>
>>> Hmm... this doesn't sound right.
>>>
>>> That example (ThreadedIndexWriter) is meant to be a drop-in
>>> replacement, wherever you use an IndexWriter, that keeps an
>>> under-the-hood thread pool (using java.util.concurrent.*) to
>>> add/update documents with multiple threads.
>>>
>>> It should not result in a smaller index.
>>>
>>> Can you sanity check the index?  Eg is numDocs() the same for both?
>>> You definitely called close() on the writer, right?  That method  
>>> waits
>>> for all threads to finish their work before actually closing.
>>>
>>> Mike
>>>
>>> On Thu, Jul 30, 2009 at 8:01 PM, Jibo John<jibojohn@mac.com> wrote:
>>>>
>>>> While trying out a few tuning options using contrib/benchmak as  
>>>> described
>>>> in
>>>> LIA (2nd edition) book, I had an interesting observation.
>>>>
>>>> If I use a ThreadedIndexWriter (picked the example from lia2e,  
>>>> page 356)
>>>> instead of IndexWriter, the index size got reduced by 40%  
>>>> compared to
>>>> using
>>>> IndexWriter.
>>>> Index related configuration were the same for both the tests in  
>>>> the alg
>>>> file.
>>>>
>>>> I am curious how come using a threaded index writer will have an  
>>>> impact
>>>> on
>>>> the index size.
>>>>
>>>> Appreciate your input.
>>>>
>>>> Thanks,
>>>> -Jibo
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message