lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shashi Kant <shashi....@gmail.com>
Subject Re: How to improve search time?
Date Tue, 04 Aug 2009 13:17:37 GMT
To add to all these excellent suggestions: I would suggest creating a
"baby index" out of the master index -  pull out say 1000 docs into a
test index and query. Helps in narrowing down the problem.


On Tue, Aug 4, 2009 at 8:55 AM, Matthew Hall<mhall@informatics.jax.org> wrote:
> Also, how long does it take Luke to do a search against the same index.
>
> That way you can remove any of the timing that your application is adding
> into the mix.
>
> If Luke doesn't take the minimum of 8 seconds... then you know its an issue
> with your app.  (or at least a large part of it)
>
> Matt
>
> Ian Lea wrote:
>>
>> Still surprising that your searches are taking so long.
>>
>> Have you worked through everything on
>> http://wiki.apache.org/lucene-java/ImproveSearchingSpeed, suggested by
>> someone earlier in this thread?  Are you sure that the problem is
>> really with lucene? Is it the search itself that takes a long time, or
>> retrieving data for the hits?  What does query.toString() look like?
>> How many hits does a search typically match?  Is a search on document
>> id effectively instant?
>>
>> You have to supply more detail if you want better answers.
>>
>>
>> --
>> Ian.
>>
>>
>> On Tue, Aug 4, 2009 at 12:21 PM, prashant
>> ullegaddi<prashullegaddi@gmail.com> wrote:
>>
>>>
>>> Shahi,
>>>
>>> Our queries are free text queries. But they will be expanded into:
>>> Multifield, Boolean.
>>> We are also expanding the original query using SynExpand of lucene. A
>>> simple
>>> query
>>> gets expanded to say a query of page size.
>>>
>>> And we are not storing any other fields except key (document IDs), target
>>> URLs and titles.
>>>
>>> Prashant.
>>>
>>> On Tue, Aug 4, 2009 at 1:31 PM, Shashi Kant <shashi.mit@gmail.com> wrote:
>>>
>>>
>>>>
>>>> Prashant, I have had better luck with even larger sized indices on
>>>> similar platforms. Could you elaborate what types of queries you are
>>>> running, Multifield? Boolean? combinations? etc. Also you might want
>>>> to remove unnecessary stored fields from the index and move them to a
>>>> relational db to squeeze out better performance.
>>>>
>>>>
>>>> Shashi
>>>>
>>>>
>>>> On Tue, Aug 4, 2009 at 3:18 AM, prashant
>>>> ullegaddi<prashullegaddi@gmail.com> wrote:
>>>>
>>>>>
>>>>> I did that as well. Actually, we had 32 indexes initially. We searched
>>>>>
>>>>
>>>> them.
>>>>
>>>>>
>>>>> It was even horrible.
>>>>> After that I merged them into 4 indexes. And did the same. No gain!
>>>>>
>>>>> Then, I had to merge 32 indexes into one.
>>>>>
>>>>> On Tue, Aug 4, 2009 at 10:48 AM, Anshum <anshumg@gmail.com> wrote:
>>>>>
>>>>>
>>>>>>
>>>>>> Hi Prashant,
>>>>>> 8 seconds as the minimum time is a little too much, though considering
>>>>>> you're using just 4G of RAM its still ok.
>>>>>> I would advice you to break your index into smaller indexes, perhaps
>>>>>> selectively query the indexes (if that's possible for your
>>>>>> application)
>>>>>>
>>>>
>>>> and
>>>>
>>>>>>
>>>>>> use a parallelmultisearcher. Its just something that you might try
and
>>>>>> like.
>>>>>> All said and done, parallelizing would only get you a bell-curve
like
>>>>>> performance graph, so you'd have to figure out the sweet spot there.
>>>>>>
>>>>>> --
>>>>>> Anshum Gupta
>>>>>> Naukri Labs!
>>>>>> http://ai-cafe.blogspot.com
>>>>>>
>>>>>> The facts expressed here belong to everybody, the opinions to me.
The
>>>>>> distinction is yours to draw............
>>>>>>
>>>>>>
>>>>>> On Tue, Aug 4, 2009 at 10:08 AM, prashant ullegaddi <
>>>>>> prashullegaddi@gmail.com> wrote:
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> I'm running it on Quadcore, 2.4GHz each, 4GB RAM.
>>>>>>>
>>>>>>> Prashant.
>>>>>>>
>>>>>>> On Tue, Aug 4, 2009 at 8:38 AM, Otis Gospodnetic <
>>>>>>> otis_gospodnetic@yahoo.com
>>>>>>>
>>>>>>>>
>>>>>>>> wrote:
>>>>>>>>              With such a large index be prepared to
put it on a
>>>>>>>> server with lots
>>>>>>>>
>>>>
>>>> of
>>>>
>>>>>>>
>>>>>>> RAM
>>>>>>>
>>>>>>>>
>>>>>>>> (even if you follow all the tips from the Wiki).
>>>>>>>> When reporting performance numbers, you really ought to tell
us
>>>>>>>>
>>>>
>>>> about
>>>>
>>>>>>>
>>>>>>> your
>>>>>>>
>>>>>>>>
>>>>>>>> hardware, types of queries, etc.
>>>>>>>>
>>>>>>>> Otis
>>>>>>>> --
>>>>>>>> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
>>>>>>>> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER,
IR
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ----- Original Message ----
>>>>>>>>
>>>>>>>>>
>>>>>>>>> From: prashant ullegaddi <prashullegaddi@gmail.com>
>>>>>>>>> To: java-user@lucene.apache.org
>>>>>>>>> Sent: Monday, August 3, 2009 12:33:46 AM
>>>>>>>>> Subject: How to improve search time?
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I've a single index of size 87GB containing around 50M
documents.
>>>>>>>>>
>>>>>>
>>>>>> When
>>>>>>
>>>>>>>
>>>>>>> I
>>>>>>>
>>>>>>>>>
>>>>>>>>> search for any query,
>>>>>>>>> best search time I observed was 8sec. And when query
is expanded
>>>>>>>>>
>>>>
>>>> with
>>>>
>>>>>>>>>
>>>>>>>>> synonyms, search takes
>>>>>>>>> minutes (~ 2-3min). Is there a better way to search so
that
>>>>>>>>>
>>>>
>>>> overall
>>>>
>>>>>>>>
>>>>>>>> search
>>>>>>>>
>>>>>>>>>
>>>>>>>>> time reduces?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Prashant.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>>
>>>>>>>>
>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
>
> --
> Matthew Hall
> Software Engineer
> Mouse Genome Informatics
> mhall@informatics.jax.org
> (207) 288-6012
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>



-- 

Phone# (617) 714-4775
Cell# (617) 642-6745
Google Voice# (617) 575-9264

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message