lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Audenaerde <rob.audenae...@gmail.com>
Subject Re: indexing performance 6.6 vs 7.1
Date Mon, 29 Jan 2018 12:08:43 GMT
Hi Uwe,

Thanks for the reply. We commit often. Actually, in the benchmark, we
commit every 60 documents (but we will run a larger set with less commits).
The number of commits we call does not change between 6.6. and 7.1. In our
production systems  we commit every 5000 documents.

We dug deeper into the commit methods, and currently see the main
difference seems to be the calls to the java.util.zit.Checksum.update().
The number of calls to that method in 6.6 is around 11M  , and 7.1  21M, so
almost twice the calls.

-Rob

On Mon, Jan 29, 2018 at 12:18 PM, Uwe Schindler <uwe@thetaphi.de> wrote:

> Hi,
>
> How often do you commit? If you index the data initially (that's the case
> where indexing needs to be fast), one would call commit at the end of the
> whole job, so the actual time it takes is not so important.
>
> If you have a system where the index is updated all the time, then of
> course committing is also something you have to take into account. Systems
> like Solr or Elasticsearch use a transaction log in parallel to indexing,
> so they commit very seldom. If the system crashes, the changes are replayed
> from tranlog since last commit.
>
> Uwe
>
> -----
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
> > -----Original Message-----
> > From: Rob Audenaerde [mailto:rob.audenaerde@gmail.com]
> > Sent: Monday, January 29, 2018 11:29 AM
> > To: java-user@lucene.apache.org
> > Subject: Re: indexing performance 6.6 vs 7.1
> >
> > Hi all,
> >
> > Some follow up (sorry for the delay).
> >
> > We built a benchmark in our application, and profiled it (on a smallish
> > data set). What we currently see in the profiler is that in Lucene 7.1
> the
> > calls to `commit()` take much longer.
> >
> > The self-time committing in 6.6: 3,215 ms
> > The self-time committing in 7.1: 10,187 ms.
> >
> > We will try to run a larger data set and also later with the IW info
> > stream.
> >
> > -Rob
> >
> > On Thu, Jan 18, 2018 at 7:03 PM, Erick Erickson <erickerickson@gmail.com
> >
> > wrote:
> >
> > > Robert:
> > >
> > > Ah, right. I keep confusing my gmail lists
> > > "lucene dev"
> > > and
> > > "lucene list"....
> > >
> > > Siiigggghhhhh.
> > >
> > >
> > >
> > > On Thu, Jan 18, 2018 at 9:18 AM, Adrien Grand <jpountz@gmail.com>
> > wrote:
> > > > If you have sparse data, I would have expected index time to
> *decrease*,
> > > > not increase.
> > > >
> > > > Can you enable the IW info stream and share flush + merge times to
> see
> > > > where indexing time goes?
> > > >
> > > > If you can run with a profiler, this might also give useful
> information.
> > > >
> > > > Le jeu. 18 janv. 2018 à 11:23, Rob Audenaerde
> > <rob.audenaerde@gmail.com>
> > > a
> > > > écrit :
> > > >
> > > >> Hi all,
> > > >>
> > > >> We recently upgraded from Lucene 6.6 to 7.1.  We see a significant
> drop
> > > in
> > > >> indexing performace.
> > > >>
> > > >> We have a-typical use of Lucene, as we (also) index some database
> > tables
> > > >> and add all the values as AssociatedFacetFields as well. This
> allows us
> > > to
> > > >> create pivot tables on search results really fast.
> > > >>
> > > >> These tables have some overlapping columns, but also disjoint ones.
> > > >>
> > > >> We anticipated a decrease in index size because of the sparse
> > > docvalues. We
> > > >> see this happening, with decreases to ~50%-80% of the original index
> > > size.
> > > >> But we did not expect an drop in indexing performance (client
> systems
> > > >> indexing time increased with +50% to +250%).
> > > >>
> > > >> (Our indexing-speed used to be mainly bound by the speed the
> > Taxonomy
> > > could
> > > >> deliver new ordinals for new values, currently we are investigating
> if
> > > this
> > > >> is still the case, will report later when a profiler run has been
> done)
> > > >>
> > > >> Does anyone know if this increase in indexing time is to be
> expected as
> > > >> result of the sparse docvalues change?
> > > >>
> > > >> Kind regards,
> > > >>
> > > >> Rob Audenaerde
> > > >>
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message