lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <ian....@gmail.com>
Subject Re: Sharding Techniques
Date Wed, 11 May 2011 09:15:14 GMT
I'm sure that you should try building one large index and convert to
NumericField wherever you can.  I'm convinced that will be faster -
but as ever, the proof will be in the numbers.

On repeated terms, I believe that lucene will search multiple times.
If so, I'd guess it is just something that has never been optimized.
225 terms is not a very big number, but not very small either.
Complex queries with lots of terms can be expected to be slower than
simple queries with few terms.  If you have a particular problem with
repeated terms perhaps you could dedup them yourself.


--
Ian.

On Wed, May 11, 2011 at 9:10 AM, Samarendra Pratap <samarzone@gmail.com> wrote:
> Hi Tom,
>  the more i am getting responses in this thread the more i feel that our
> application needs optimization.
>
> 350 GB and less than 2 seconds!!! That's much more than my expectation :-)
> (in current scenario).
>
> *characteristics of slow queries:*
>  there are a few reasons for greater search time
>
>  1.Two of our fields contain decimal values but are not NumericField :( .
> These fields are searched as a range. Whenever the ranges are larger and/or
> both the fields are used in search the search time and server load goes
> high. I have already started work to convert it to NumericField - but
> suggestions and experiences are most welcome.
>
> 2. When queries (without two fields mentioned above) have a lot of
> words/phrases search time is high. E.g I took a query with around 80 unique
> terms (not words) in 5 fields. These terms occur repeatedly and become total
> 225 terms (non-unique). This particular query took 4.2 seconds. the 15
> indexes used for this query were of total size 5 G.
> Are 225 terms (80 unique) is a very big number?
>
> and yes, slow queries are always slow. yes but obviously high load will add
> up to their slowness.
>
>
>
>
> Here I have another curiosity about something I noticed.
> If I have a query like following:
>
>
> title:xyz title:xyz title:xyz title:xyz title:xyz title:xyz title:xyz
> title:xyz title:xyz title:xyz title:xyz
>
> *Will lucene search for the term 11 times or it will reuse the results of
> first term?*
>
> If later is true (which I think is), is there any particular reason or it
> may be optimized inside lucene?
>
>
> On Tue, May 10, 2011 at 9:46 PM, Burton-West, Tom <tburtonw@umich.edu>wrote:
>
>> Hi Samar,
>>
>> >>Normal queries go fine under 500 ms but when people start searching
>> >>"anything" some queries take up to > 100 seconds. Don't you think
>> >>distributing smaller indexes on different machines would reduce the
>> average
>> >>.search time. (Although I have a feeling that search time for smaller
>> queries
>> >>may be slightly increased)
>>
>> What are the characteristics of your slow queries?  Can you give examples?
>>   Are the slow queries always slow or only under heavy load?   What the
>> bottleneck is and whether splitting into smaller indexes would help depends
>> on just what your bottleneck is. It's not clear that your index is large
>> enough that the size of the index is causing your bottleneck.
>>
>> We run indexes of about 350GB with average response times under 200ms and
>> 99th percentile reponse times of under 2 seconds. (We have a very low qps
>> rate however).
>>
>>
>> Tom
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
>
> --
> Regards,
> Samar
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message