lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cedric Ho" <cedric...@gmail.com>
Subject Re: large term vectors
Date Mon, 11 Feb 2008 06:34:17 GMT
I guess it would be quite different for different apps.

For me, I do index update on a single machine: index each incoming
documents into one chunk according to some rule to ensure even
distribution. Then copy all the updated indexes to some other machines
for searching. Each machine will then reopen the updated index.

For searching you can look at RemoteSearchable + ParallelSearcher. But
if you need redundancy / failover, etc, you will probably need to do
it yourself.

Cedric


On Feb 11, 2008 11:14 AM, Briggs <acidbriggs@gmail.com> wrote:
> So, I have a question about 'splitting indexes'.  I see people say
> this all over, but how have people been handling this.  I'm going to
> start a new thread, and there probably was one back in the day, but I
> am going to fire it up again.   But, how did you do it?
>
>
> On Feb 10, 2008 9:18 PM, Cedric Ho <cedric.ho@gmail.com> wrote:
> > Is it a single index ? My index is also in the 200G range, but I never
> > managed to get
> > a single index of size > 20G and still get acceptable performance (in
> > both searching and updating).
> > So I split my indexes into chunks of < 10G
> >
> > I am curious as to how you manage such a single large index.
> >
> > Cedric
> >
> >
> >
> >
> > On Feb 8, 2008 11:51 PM,  <marc.dumontier@thomson.com> wrote:
> > > Hi,
> > >
> > >
> > >
> > > I have a large index which is around 275GB. As I search different parts
> > > of the index, the memory footprint grows with large byte arrays being
> > > stored. They never seem to get unloaded or GC'ed. Is there any way to
> > > control this behavior so that I can periodically unload cached
> > > information?
> > >
> > >
> > >
> > > The nature of the data being indexed doesn't allow me to reduce the
> > > number of terms per field, although I might be able to reduce the number
> > > of overall fields (I have some which aren't currently being searched
> > > by).
> > >
> > >
> > >
> > > I've just begun investigating and profiling the problem, so I don't have
> > > a lot of details at this time. Any support would be extremely welcome.
> > >
> > >
> > >
> > > Thanks,
> > >
> > >
> > >
> > > Marc Dumontier
> > > Manager, Software Development
> > > Thomson Scientific (Canada)
> > > 1 Yonge Street, Suite 1801
> > > Toronto, Ontario M5E 1W7
> > >
> > >
> > >
> > > Direct +1 416 214 3448
> > > Mobile +1 416 454 3147
> > >
> > >
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
>
>
> --
> "Conscious decisions by conscious minds are what make reality real"
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message