lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Askar Zaidi" <askar.za...@gmail.com>
Subject Re: Fine Tuning Lucene implementation
Date Tue, 24 Jul 2007 23:58:12 GMT
I ran some tests and it seems that the slowness is from Lucene calls when I
do "doBodySearch", if I remove that call, Lucene gives me results in 5
seconds. otherwise it takes about 50 seconds.

But I need to do Body search and that field contains lots of text. The field
is <contents>. How can I optimize that ?

thanks,
Askar



On 7/24/07, Grant Ingersoll <gsingers@apache.org> wrote:
>
> Sorry, I mistyped. I don't mean the getXXXX methods, I mean the
> doTagSearch, doTitleSearch, etc.
>
> As for the stop watch, not really sure what to make of that...  Try
> System.currentTimeMillis()...
>
> You can get just the fields you want when loading a Document by using
> the FieldSelector API on IndexReader, etc.
>
> Perhaps, you can also use some Filters and cache them.
>
> Its really hard to give suggestions when it is not at all obvious
> where the slowness is.  Please try to isolate the Lucene calls from
> the DB calls and look at the timings for both.
>
> On Jul 24, 2007, at 5:28 PM, Askar Zaidi wrote:
>
> > Thanks for the reply.
> >
> > I am timing the entire search process with a stop watch, a bit
> > ghetto style.
> > My getXXX methods are:
> >
> > Document doc = hits.doc(i);
> > String str = doc.get("item");
> >
> > So you can see that I am retrieving the entire document in a search
> > query.
> > Ideally , I'd like to just retrieve the Field object that I want to
> > run the
> > search on. I know this will give me a boost as one of my Fields is
> > really
> > huge.
> >
> > My query is selecting the entire user data-set in the database. I'd
> > like to
> > do some SQL based search in the query too so that I pick only those
> > items
> > where the phrase matches.
> >
> > Index contains about 650MB of data. Index file size is 14478869 bytes.
> >
> > thanks,
> > AZ
> >
> >
> > On 7/24/07, Grant Ingersoll <gsingers@apache.org> wrote:
> >>
> >> Where are you getting your numbers from?  That is, where are your
> >> timers?  Are you timing the rs.next() loop, or the individual calls
> >> to Lucene?  What do the getXXXXX methods look like?  How big are your
> >> queries?  How big is your index?
> >>
> >> Essentially, we need more info to really help you.  From what I can
> >> tell, you are generating 3 different Lucene queries for each record
> >> in the database.  Frankly, I surprised your slowdown is only linear.
> >>
> >> On Jul 24, 2007, at 4:31 PM, Askar Zaidi wrote:
> >>
> >>> I have 512MB RAM allocated to JVM Heap. If I double my system RAM
> >>> from 768MB
> >>> to say 2GB or so, and give JVM 1.5GB Heap space, will I get quicker
> >>> results
> >>> ?
> >>>
> >>> Can I expect results which take 1 minute to be returned in 30
> >>> seconds with
> >>> more RAM ? Should I also get a more powerful CPU ? A real server
> >>> class
> >>> machine ?
> >>>
> >>> I have also done some of the optimizations that are mentioned on
> >>> the Lucene
> >>> website.
> >>>
> >>> thanks,
> >>> AZ
> >>>
> >>> On 7/24/07, Askar Zaidi <askar.zaidi@gmail.com> wrote:
> >>>>
> >>>> Hey Guys,
> >>>>
> >>>> I just finished up using Lucene in my application. I have data in a
> >>>> database , so while indexing I extract this data from the database
> >>>> and pump
> >>>> it into the index. Specifically , I have the following data in the
> >>>> index:
> >>>>
> >>>> <itemID> <tags> <title> <summary> <contents>
> >>>>
> >>>> where itemID is just a number (primary key in the DB)
> >>>> tags : text
> >>>> titie: text
> >>>> summary: text
> >>>> contents: Huge text (text extracted from files: pdfs, docs etc).
> >>>>
> >>>> Now while running a search query I realized that the response time
> >>>> increases in a linear fashion as the number of <itemID> increase
> >>>> in the DB.
> >>>>
> >>>> If I have 50 items, its 8 seconds
> >>>> 100 items, its 17 seconds.
> >>>> 300+ items, its 60 seconds and maybe more.
> >>>>
> >>>> In a perfect world, I'd like to search on 300+ items within 10-15
> >>>> seconds.
> >>>> Can anyone give me tips to fine tune lucene ?
> >>>>
> >>>> Heres a code snippet:
> >>>>
> >>>> sql query = "SELECT itemID from items where creator = 'askar' ;
> >>>>
> >>>> --execute query--
> >>>>
> >>>> while(rs.next()){
> >>>>
> >>>> score = doTagSearch(askar,text,itemID);
> >>>> scoreTitle = doTitleSearch(askar,text,itemID);
> >>>> scoreSummary = doSummarySearch(askar,text,itemID);
> >>>>
> >>>> ----
> >>>>
> >>>> }
> >>>>
> >>>> So this code asks Lucene to search for the "text" in the itemID
> >>>> passed.
> >>>> itemID is already indexed. The while loop will run 300 times if
> >>>> there are
> >>>> 300 items....that gets slow...what can I do here ??
> >>>>
> >>>> thanks for the replies,
> >>>>
> >>>> AZ
> >>>>
> >>
> >> --------------------------
> >> Grant Ingersoll
> >> Center for Natural Language Processing
> >> http://www.cnlp.org/tech/lucene.asp
> >>
> >> Read the Lucene Java FAQ at http://wiki.apache.org/lucene-java/
> >> LuceneFAQ
> >>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message