lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrien Grand <jpou...@gmail.com>
Subject Re: Help with huge index
Date Thu, 01 Mar 2018 01:34:01 GMT
Thanks. Deleting lots of documents can indeed trigger a lot of work in the
Lucene side. First Lucene likely needs to rewrite the live docs of all your
segments and then this might trigger significant merging activity due to
the fact that Lucene tries to keep the number of deleted docs reasonable so
that most disk space is not spent on deleted docs. I can't think of
settings that would make it more efficient.

If you call deleteDocuments because you are eg. deleting data after a given
age, it would help to have time-based indices so that you would remove an
entire index at once rather than large portions of an index.

Le jeu. 1 mars 2018 à 01:20, Stuart Goldberg <sgoldberg@fixflyer.com> a
écrit :

> I call deleteDocuments
>
> On Feb 28, 2018 8:16 PM, "Adrien Grand" <jpountz@gmail.com> wrote:
>
> > What do you mean by purging? What methods do you call?
> >
> > Le mer. 28 févr. 2018 à 19:34, Stuart Goldberg <sgoldberg@fixflyer.com>
> a
> > écrit :
> >
> > > I have huge lucene index. On disk it's about 24Gb.
> > >
> > >
> > >
> > > I have a purging routine that is supposed to run and purge old docs.
> > >
> > >
> > >
> > > There are about 650 million docs in there and through testing I have
> > > determined that about 1/3 of these need to be purged.
> > >
> > >
> > >
> > > During the purge, every so often it's apparently doing some flushing
> and
> > > applying deletes. This causes the process to hang. I know it's not
> > hanging,
> > > but actually doing work because I have enabled infostream and I am
> > getting
> > > messages every so often (every 5 minutes).
> > >
> > >
> > >
> > > Is there some trick (index config) I can employ to get this to work
> > faster.
> > >
> > >
> > >
> > > Stuart M Goldberg
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message