lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Li Li <fancye...@gmail.com>
Subject Re: instantiated contrib
Date Fri, 27 Aug 2010 03:35:11 GMT
" It is strange that it should take 20 second to gather fields,"
20s including search and gather fields, it's the total time

2010/8/27 Karl Wettin <karl.wettin@gmail.com>:
> My mail client died while sending this mail.. Sorry for any duplicate.
>
> It is strange that it should take 20 second to gather fields, this is the
> only thing that really suprises me. I'd expect it to be instant compared to
> RAMDirectory. It is hard to say from the information you provided. Did you
> perhaps lazy load field values from your RAMDirectory and not retrieve them,
> or something like that?
>
> Why your queries are slow is also hard to say, there can be many reaons. 70k
> documents can be quite a few documents for II if they contain enough text.
> Here are a few questions that may or may not be helpful:
>
> What is the content of the documents? Do they contain a lot of the same
> text? Or are they all rather unique? The major thing that makes II faster
> than RAMDirectory is that it does not have to deserialize values from the
> bytestream. As the index grows binary searching for documents containing a
> given term will start consume more time than deserializing the index.
>
> What speed do you see if you only load 10% (7k)?
>
> Did you see the graphics in the package level javadocs?
> http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/store/instantiated/package-summary.html
>
>
>        karl
>
>
> 26 aug 2010 kl. 09.24 skrev Li Li:
>
>> I have about 70k document, the total indexed size is about 15MB(the
>> orginal text files' size).
>>               dir=new RAMDirectory();
>>               IndexWriter write=new IndexWriter(dir,...;
>>               for(loop){
>>                    writer.addDocument(doc);
>>               }
>>        writer.optimize();
>>        writer.close();
>>
>>        IndexReader ir=IndexReader.open(dir,true);
>>        InstantiatedIndex ii=new InstantiatedIndex(ir);
>>        InstantiatedIndexReader iir=new InstantiatedIndexReader(ii);
>>        is=new IndexSearcher(ir);
>>        is2=new IndexSearcher(iir);
>>
>>             I calculate the time by:
>>        long searchStart=System.nanoTime();
>>        TopDocs docs=is.search(bQuery,Integer.MAX_VALUE);
>>        long searchEnd=System.nanoTime();
>>
>>            I searched 10,000 documents and the time of RAMDirectory
>> and instantiated
>>            the time used is time1: 21s(21812978000 ns) time2:
>> 20s(20713817000 ns)
>>            I also calulate the time including get field value:
>>               total1: 23852ms total2: 22610ms
>>           it seems instantiated is not much faster than
>> RAMDirectory. Is there any thing wrong I used? my max memory is 4GB
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message