lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anshum <anshum.gu...@naukri.com>
Subject Re: Lucene Search Performance
Date Wed, 27 Feb 2008 04:52:56 GMT
Hi Jamie,

Are you running concurrent searches on the index i.e. spawning multiple
threads and not handling them?
I have been having similar issues and I am planning to try out a
workaround for it using Java's Interface Executor.
http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/Executor.html
This might help as it would control the resource utilization and of what
I know it as, it would pool resources and threads to handle concurrency,
thereby decreasing time.
Let me know in case you come across something better.

--
Anshum



On Wed, 2008-02-27 at 07:37 +0530, h t wrote:
> Hi Michael,
> I guess the hotspot of lucene is
> org.apache.lucene.search.IndexSearcher.search()
> 
> Hi Jamie,
> What's the original text size of a million emails?
> I estimate the size of an email is around 100k, is this true?
> When you doing search, what kind keywords did you input, words or short
> sentence?
> How many results return?
> Did you use filter to shrink the results size?
> 
> 2008/2/27, Michael Stoppelman <stopman@gmail.com>:
> >
> > So you're saying searches are taking 10 seconds on a 5G index? If so that
> > seems ungodly slow.
> > If you're on *nix, have you watched your iostat statistics? Maybe
> > something
> > is hammering your hds.
> > Something seems amiss.
> >
> > What lucene methods were pointed to as hotspots by YourKit?
> >
> >
> > -M
> >
> >
> > On Tue, Feb 26, 2008 at 2:13 PM, Jamie <jamie@stimulussoft.com> wrote:
> >
> > > Hi Michael
> > >
> > > Perhaps this will help. We are using Lucene to index emails and provide
> > > a search interface to search through those emails. Many of our customers
> > > have 3-5 TB's or more of email data. The index size tends to be around 5
> > > GB per million messages. On a 3 GHZ intel core duo with standard 7200 mb
> > > drive, it takes approx. 10 seconds to search across a million emails. We
> > > need sub second search times, especially since, as time progresses, some
> > > of our archives are expected to reach 10-20 TB of data. In future, we
> > > will be recommending the use of SSD drives, but I'd like to know if they
> > > are any other strategies can pursued. One such strategy is to
> > > automatically create a new index after the index gets to a certain size.
> > > Then, when a search is conducted, based on date, search only those
> > > indexes that fall between specified dates. I've run my code through the
> > > YourKit profiler. The time appears to be consumed by Lucene itself and
> > > not by my code.
> > >
> > > Any other ideas?
> > >
> > >
> > > Michael Stoppelman wrote:
> > > > On Tue, Feb 26, 2008 at 10:18 AM, Jamie <jamie@stimulussoft.com>
> > wrote:
> > > >
> > > >
> > > >> Hi
> > > >>
> > > >> I am looking for a way to improve the search performance of my
> > > >> application. I've followed every suggestion in the Lucene Wiki but
> > the
> > > >> search is still too slow with large indexes. I was wondering whether
> > > >>
> > > >
> > > >
> > > > Did you optimize your index yet? That gave me a 2x bump.
> > > >
> > > > Have you put timers around parts of your code? Maybe it's something
> > > > unrelated to lucene.
> > > > You should probably give more details on your setup if you want more
> > > helpful
> > > > advice.
> > > >
> > > >
> > > >
> > > >> there was a way to restrict a search to a specific time period and
in
> > > >> doing so sacrifice the quality of search results? Any other
> > suggestions
> > > >> on how to improve search performance?
> > > >>
> > > >> Much appreciate
> > > >>
> > > >> Jamie
> > > >>
> > > >>
> > > >> ---------------------------------------------------------------------
> > > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > >> For additional commands, e-mail: java-user-help@lucene.apache.org
> > > >>
> > > >>
> > > >>
> > > >
> > > >
> > >
> > >
> > > --
> > > Stimulus Software - MailArchiva
> > > Email Archiving And Compliance
> > > USA Tel: +1-713-366-8072 ext 3
> > > UK Tel: +44-20-80991035 ext 3
> > > Email: jamie@stimulussoft.com
> > > Web: http://www.mailarchiva.com
> > >
> > > To receive MailArchiva Enterprise Edition product announcements, send a
> > > message to: <mailarchiva-enterprise-edition-subscribe@stimulussoft.com>
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message