lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Askar Zaidi" <askar.za...@gmail.com>
Subject Re: Fine Tuning Lucene implementation
Date Tue, 24 Jul 2007 21:28:36 GMT
Thanks for the reply.

I am timing the entire search process with a stop watch, a bit ghetto style.
My getXXX methods are:

Document doc = hits.doc(i);
String str = doc.get("item");

So you can see that I am retrieving the entire document in a search query.
Ideally , I'd like to just retrieve the Field object that I want to run the
search on. I know this will give me a boost as one of my Fields is really
huge.

My query is selecting the entire user data-set in the database. I'd like to
do some SQL based search in the query too so that I pick only those items
where the phrase matches.

Index contains about 650MB of data. Index file size is 14478869 bytes.

thanks,
AZ


On 7/24/07, Grant Ingersoll <gsingers@apache.org> wrote:
>
> Where are you getting your numbers from?  That is, where are your
> timers?  Are you timing the rs.next() loop, or the individual calls
> to Lucene?  What do the getXXXXX methods look like?  How big are your
> queries?  How big is your index?
>
> Essentially, we need more info to really help you.  From what I can
> tell, you are generating 3 different Lucene queries for each record
> in the database.  Frankly, I surprised your slowdown is only linear.
>
> On Jul 24, 2007, at 4:31 PM, Askar Zaidi wrote:
>
> > I have 512MB RAM allocated to JVM Heap. If I double my system RAM
> > from 768MB
> > to say 2GB or so, and give JVM 1.5GB Heap space, will I get quicker
> > results
> > ?
> >
> > Can I expect results which take 1 minute to be returned in 30
> > seconds with
> > more RAM ? Should I also get a more powerful CPU ? A real server class
> > machine ?
> >
> > I have also done some of the optimizations that are mentioned on
> > the Lucene
> > website.
> >
> > thanks,
> > AZ
> >
> > On 7/24/07, Askar Zaidi <askar.zaidi@gmail.com> wrote:
> >>
> >> Hey Guys,
> >>
> >> I just finished up using Lucene in my application. I have data in a
> >> database , so while indexing I extract this data from the database
> >> and pump
> >> it into the index. Specifically , I have the following data in the
> >> index:
> >>
> >> <itemID> <tags> <title> <summary> <contents>
> >>
> >> where itemID is just a number (primary key in the DB)
> >> tags : text
> >> titie: text
> >> summary: text
> >> contents: Huge text (text extracted from files: pdfs, docs etc).
> >>
> >> Now while running a search query I realized that the response time
> >> increases in a linear fashion as the number of <itemID> increase
> >> in the DB.
> >>
> >> If I have 50 items, its 8 seconds
> >> 100 items, its 17 seconds.
> >> 300+ items, its 60 seconds and maybe more.
> >>
> >> In a perfect world, I'd like to search on 300+ items within 10-15
> >> seconds.
> >> Can anyone give me tips to fine tune lucene ?
> >>
> >> Heres a code snippet:
> >>
> >> sql query = "SELECT itemID from items where creator = 'askar' ;
> >>
> >> --execute query--
> >>
> >> while(rs.next()){
> >>
> >> score = doTagSearch(askar,text,itemID);
> >> scoreTitle = doTitleSearch(askar,text,itemID);
> >> scoreSummary = doSummarySearch(askar,text,itemID);
> >>
> >> ----
> >>
> >> }
> >>
> >> So this code asks Lucene to search for the "text" in the itemID
> >> passed.
> >> itemID is already indexed. The while loop will run 300 times if
> >> there are
> >> 300 items....that gets slow...what can I do here ??
> >>
> >> thanks for the replies,
> >>
> >> AZ
> >>
>
> --------------------------
> Grant Ingersoll
> Center for Natural Language Processing
> http://www.cnlp.org/tech/lucene.asp
>
> Read the Lucene Java FAQ at http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message