lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jg lin <linji...@gmail.com>
Subject Re: about contrib instantiated
Date Wed, 07 Jul 2010 02:57:54 GMT
用汉语几句话就搞定的东西,非得叽叽歪歪整这么一段。洋文一个字,烂。。。

2010/7/3 Karl Wettin <karl.wettin@gmail.com>

>
> 2 jul 2010 kl. 08.32 skrev Li Li:
>
>
> I  have an index of
>> about 8,000,000 document and the current index size is about 30GB. Is
>> it possbile to use this contrib to speed up my search? I have enough
>> memory for it.
>>
>
>
> In order to answer your question you'll need to benchmark using a lot of
> typical queries. My guess is that it will probably be about as fast as a
> RAMDirectory while consuming a lot more memory. It's hard to say for sure
> though.
>
> II is faster than RD mainly due to the need for RD to unmarshall
> information from a byte stream to java instances, hence the name. As the
> index grows the time spent in RD unmarshalling will shrink compared to the
> time spent seeking (mainly in DocsEnum/DocsAndPositionsEnum) and scoring
> documents. Thus executing queries on a large index using terms that are only
> available in a small portion of the documents should be faster on II than on
> RD, while exeuting queries using frequently occuring terms will consume
> about as much time.
>
> (Perhaps the documentation should explain it this way rather than just
> state "Mileage may vary depending on term saturation".)
>
> While benchmarking remember that RD might require a warm up period while II
> does not.
>
> Feel free to report back with any findings.
>
>
>
>        karl
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message