lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Fine Tuning Lucene implementation
Date Tue, 24 Jul 2007 22:48:12 GMT
Sorry, I mistyped. I don't mean the getXXXX methods, I mean the  
doTagSearch, doTitleSearch, etc.

As for the stop watch, not really sure what to make of that...  Try  
System.currentTimeMillis()...

You can get just the fields you want when loading a Document by using  
the FieldSelector API on IndexReader, etc.

Perhaps, you can also use some Filters and cache them.

Its really hard to give suggestions when it is not at all obvious  
where the slowness is.  Please try to isolate the Lucene calls from  
the DB calls and look at the timings for both.

On Jul 24, 2007, at 5:28 PM, Askar Zaidi wrote:

> Thanks for the reply.
>
> I am timing the entire search process with a stop watch, a bit  
> ghetto style.
> My getXXX methods are:
>
> Document doc = hits.doc(i);
> String str = doc.get("item");
>
> So you can see that I am retrieving the entire document in a search  
> query.
> Ideally , I'd like to just retrieve the Field object that I want to  
> run the
> search on. I know this will give me a boost as one of my Fields is  
> really
> huge.
>
> My query is selecting the entire user data-set in the database. I'd  
> like to
> do some SQL based search in the query too so that I pick only those  
> items
> where the phrase matches.
>
> Index contains about 650MB of data. Index file size is 14478869 bytes.
>
> thanks,
> AZ
>
>
> On 7/24/07, Grant Ingersoll <gsingers@apache.org> wrote:
>>
>> Where are you getting your numbers from?  That is, where are your
>> timers?  Are you timing the rs.next() loop, or the individual calls
>> to Lucene?  What do the getXXXXX methods look like?  How big are your
>> queries?  How big is your index?
>>
>> Essentially, we need more info to really help you.  From what I can
>> tell, you are generating 3 different Lucene queries for each record
>> in the database.  Frankly, I surprised your slowdown is only linear.
>>
>> On Jul 24, 2007, at 4:31 PM, Askar Zaidi wrote:
>>
>>> I have 512MB RAM allocated to JVM Heap. If I double my system RAM
>>> from 768MB
>>> to say 2GB or so, and give JVM 1.5GB Heap space, will I get quicker
>>> results
>>> ?
>>>
>>> Can I expect results which take 1 minute to be returned in 30
>>> seconds with
>>> more RAM ? Should I also get a more powerful CPU ? A real server  
>>> class
>>> machine ?
>>>
>>> I have also done some of the optimizations that are mentioned on
>>> the Lucene
>>> website.
>>>
>>> thanks,
>>> AZ
>>>
>>> On 7/24/07, Askar Zaidi <askar.zaidi@gmail.com> wrote:
>>>>
>>>> Hey Guys,
>>>>
>>>> I just finished up using Lucene in my application. I have data in a
>>>> database , so while indexing I extract this data from the database
>>>> and pump
>>>> it into the index. Specifically , I have the following data in the
>>>> index:
>>>>
>>>> <itemID> <tags> <title> <summary> <contents>
>>>>
>>>> where itemID is just a number (primary key in the DB)
>>>> tags : text
>>>> titie: text
>>>> summary: text
>>>> contents: Huge text (text extracted from files: pdfs, docs etc).
>>>>
>>>> Now while running a search query I realized that the response time
>>>> increases in a linear fashion as the number of <itemID> increase
>>>> in the DB.
>>>>
>>>> If I have 50 items, its 8 seconds
>>>> 100 items, its 17 seconds.
>>>> 300+ items, its 60 seconds and maybe more.
>>>>
>>>> In a perfect world, I'd like to search on 300+ items within 10-15
>>>> seconds.
>>>> Can anyone give me tips to fine tune lucene ?
>>>>
>>>> Heres a code snippet:
>>>>
>>>> sql query = "SELECT itemID from items where creator = 'askar' ;
>>>>
>>>> --execute query--
>>>>
>>>> while(rs.next()){
>>>>
>>>> score = doTagSearch(askar,text,itemID);
>>>> scoreTitle = doTitleSearch(askar,text,itemID);
>>>> scoreSummary = doSummarySearch(askar,text,itemID);
>>>>
>>>> ----
>>>>
>>>> }
>>>>
>>>> So this code asks Lucene to search for the "text" in the itemID
>>>> passed.
>>>> itemID is already indexed. The while loop will run 300 times if
>>>> there are
>>>> 300 items....that gets slow...what can I do here ??
>>>>
>>>> thanks for the replies,
>>>>
>>>> AZ
>>>>
>>
>> --------------------------
>> Grant Ingersoll
>> Center for Natural Language Processing
>> http://www.cnlp.org/tech/lucene.asp
>>
>> Read the Lucene Java FAQ at http://wiki.apache.org/lucene-java/ 
>> LuceneFAQ
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message