lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fredrik Andersson" <fidde.anders...@gmail.com>
Subject Re: Kind of hardware config ?
Date Tue, 29 Aug 2006 08:28:55 GMT
Hey guys.

4Gb of RAM for an index of 2 million documents should really not be a
problem. You should consider separating the index from the actual content (
i.e, only save the index data in your index, not the html), if you have the
possibility to do that. I am not very comfortable with the very core
functionality in Lucene, but even if you stored the raw data with the index
data, only the index data should be held in memory and the raw data read
from disk with, if there's room, some caching.

With the numbers you mention James, it sounds like both the raw data and
index data is held in memory? If you have a good insight into the internals,
feel free to correct me on this issue... i'm also involved in applications
with very large indices, so this is very interesting.

Thanks,
Fredrik


On 8/28/06, James <james@ryley.com> wrote:
>
> OK, so you aren't going to get it into memory unless you spend a lot on
> servers.  We haven't found memory (or disk access) to be a limiting factor
> anyway -- CPU is the issue.  I'm not sure what you want to spend, but a
> single server with SATA RAID, 4GB RAM and the latest AMD processor will
> search your collection in ~10-20 seconds, depending on the complexity of
> the
> search.  If you need faster performance or the ability to support many
> hits
> at once, you are going to have to parallelize the configuration across
> multiple servers using ParallelMultiSearcher.
>
> Keep in mind that Lucene isn't really set up to handle parallel searching
> robustly.  There is a lot of code you are going to have to write for an
> enterprise-ready solution (e.g., checking the status of a given server to
> make sure it isn't down, redundantly storing indexes so that the search
> still functions if one server is down, potentially handling laggards to
> increase speed, etc.).
>
> We have done some of this, and have more to do -- it is a very non-trivial
> task.
>
> Sincerely,
> James Ryley, Ph.D.
>
> > -----Original Message-----
> > From: caribou_surf [mailto:eric@mixad.com]
> > Sent: Monday, August 28, 2006 10:42 AM
> > To: general@lucene.apache.org
> > Subject: RE: Kind of hardware config ?
> >
> >
> > About 100 Giga
> >
> >
> >
> > James-10 wrote:
> > >
> > > What's the total document size?
> > >
> > > Sincerely,
> > > James Ryley, Ph.D.
> > >
> > >> -----Original Message-----
> > >> From: caribou_surf [mailto:eric@mixad.com]
> > >> Sent: Monday, August 28, 2006 5:01 AM
> > >> To: general@lucene.apache.org
> > >> Subject: Kind of hardware config ?
> > >>
> > >>
> > >> We want to index about 2 millions of html documents with Lucune.
> > >> Have you an idea of the machine configuration the most adapted (bi
> > proc,
> > >> 2
> > >> Go on memrory, raid disks...) ?
> > >> --
> > >> View this message in context: http://www.nabble.com/Kind-of-hardware-
> > >> config---tf2176085.html#a6016661
> > >> Sent from the Lucene - General forum at Nabble.com.
> > >
> > >
> > >
> >
> > --
> > View this message in context: http://www.nabble.com/Kind-of-hardware-
> > config---tf2176085.html#a6021457
> > Sent from the Lucene - General forum at Nabble.com.
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message