lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Florence" <mflore...@airsmail.com>
Subject RE: Running OutOfMemory while optimizing and searching
Date Wed, 30 Jun 2004 13:20:13 GMT
Thanks, Jim, your arithmetic is compelling. The queries
are ad hoc, of course, but an average query might easily
search on 10-15 fields. We avoid wildcards except in a few
special cases, because they can be very slow. Example: to
search for first name beginning with M -- firstName:M* --
takes well over 40 seconds on my index.

Given these numbers, what is your experience of hardware
necessary to get decent performance out of large, highly
interactive Lucene indexes? I'm thinking of Alex McManus's
recent post "Making a case for Lucene" where he posits an
index over 100M documents, a place where we plan to be too.

It seems that all my issues are amenable to a hardware
solution. I guess I just don't know enough to make a
recommendation to my people as to exactly what kind of
iron we need :(

-- Mark



-----Original Message-----
From: James Dunn [mailto:james_h_dunn@yahoo.com]
Sent: Tuesday, June 29, 2004 07:29 pm
To: Lucene Users List
Subject: RE: Running OutOfMemory while optimizing and searching


Mark,

What do your queries look like?  The memory required
for a query can be computed by the following equation:

1 Byte * Number of fields in your query * Number of
docs in your index

So if your query searches on all 50 fields of your 3.5
Million document index then each search would take
about 175MB.  If your 3-4 searches run concurrently
then that's about 525MB to 700MB chewed up at once.

Also, if your queries use wildcards, the memory
requirements could be much greater.

Hope that helps,

Jim
--- Mark Florence <mflorence@airsmail.com> wrote:
> Otis, Thanks for considering this problem.
>
> I'm using all the default parameters -- and still
> scratching my head!
>
> I can't see any that would control memory usage.
> Plus, a 2GB heap is
> quite big. I see others have indexes bigger than
> mine, so I'm not sure
> how to find out why mine is throwing OutOfMemory --
> not only on the
> optimize, but when 3-4 searchers are running, too.
>
> -- Mark
>
> -----Original Message-----
> From: Otis Gospodnetic
> [mailto:otis_gospodnetic@yahoo.com]
> Sent: Tuesday, June 29, 2004 01:02 am
> To: Lucene Users List
> Subject: Re: Running OutOfMemory while optimizing
> and searching
>
>
> Mark,
>
> Tough situation.  I hate when things like this
> happen on production :(.
>  You are not mentioning what you are using for
> various IndexWriter
> parameters.  You may be able to get this working by
> tweaking them (see
>
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexWrite
> r.html#field_summary).
> Hm, now that I think about it, I am not sure if
> those are considered
> during index optimization.  I'll try checking the
> sources later.
>
> Otis
>
> --- Mark Florence <mflorence@airsmail.com> wrote:
> > Hi, I'm using Lucene to index ~3.5M documents,
> over about 50 fields.
> > The Lucene
> > index itself is ~10.5GB, spread over ~7,000 files.
> Some of these
> > files are
> > "large" -- that is, several PRX files are ~1.5GB.
> >
> > Lucene runs on a dedicated server (Linux on a 1Ghz
> Dell, with 1GB
> > RAM). Clients
> > on other machines use RMI to perform reads /
> writes. Each night the
> > server
> > automatically performs an optimize.
> >
> > The problem is that the optimize now dies with an
> OutOfMemory
> > exception, even
> > when the JVM heap size is set to its maximum of
> 2GB. I need to
> > optimize, because
> > as the number of Lucene files grows, search
> performance becomes
> > unacceptable.
> >
> > Search performance is also adversely affected
> because I've had to
> > effectively
> > single-thread reads and writes. I was using a
> simple read / write
> > lock
> > mechanism, allowing multiple readers to
> simultaneously search, but
> > now more than
> > 3-4 simultaneous readers will also cause an
> OutOfMemory condition.
> > Searches can
> > take as long as 30-40 seconds, and with
> single-threading, that's
> > crippling the
> > main client application.
> >
> > Needless to say, the Lucene index is
> mission-critical, and must run
> > 24/7.
> >
> > I've seen other posts along this same vein, but no
> definite
> > consensus. Is my
> > problem simply inadequate hardware? Should I run
> on a 64-bit
> > platform, where I
> > can allocate a Java heap of > 2GB?
> >
> > Or could there be something fundamentally "wrong"
> with my index? I
> > should add
> > that I've just spent about a week (!!) rebuilding
> from scratch, over
> > all 3.5M
> > documents.
> >
> > -- Many thanks for any help! Mark Florence
> >
> >
> >
>
>
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail:
> lucene-user-help@jakarta.apache.org
>
>
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail:
> lucene-user-help@jakarta.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message