lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Audenaerde <rob.audenae...@gmail.com>
Subject Re: indexing performance 6.6 vs 7.1
Date Wed, 31 Jan 2018 09:25:08 GMT
Hi all,

We ran the benchmarks (6.6 vs 7.1) with IW info stream and (as attachment
cannot be too large) I uploaded them to google drive. They can be found
here:

https://drive.google.com/open?id=1-nAHgpPO3qZ78lnvvlQ0_lF4uHJ-cWLh

Thanks in advance,
-Rob

On Mon, Jan 29, 2018 at 1:08 PM, Rob Audenaerde <rob.audenaerde@gmail.com>
wrote:

> Hi Uwe,
>
> Thanks for the reply. We commit often. Actually, in the benchmark, we
> commit every 60 documents (but we will run a larger set with less commits).
> The number of commits we call does not change between 6.6. and 7.1. In our
> production systems  we commit every 5000 documents.
>
> We dug deeper into the commit methods, and currently see the main
> difference seems to be the calls to the java.util.zit.Checksum.update().
> The number of calls to that method in 6.6 is around 11M  , and 7.1  21M, so
> almost twice the calls.
>
> -Rob
>
> On Mon, Jan 29, 2018 at 12:18 PM, Uwe Schindler <uwe@thetaphi.de> wrote:
>
>> Hi,
>>
>> How often do you commit? If you index the data initially (that's the case
>> where indexing needs to be fast), one would call commit at the end of the
>> whole job, so the actual time it takes is not so important.
>>
>> If you have a system where the index is updated all the time, then of
>> course committing is also something you have to take into account. Systems
>> like Solr or Elasticsearch use a transaction log in parallel to indexing,
>> so they commit very seldom. If the system crashes, the changes are replayed
>> from tranlog since last commit.
>>
>> Uwe
>>
>> -----
>> Uwe Schindler
>> Achterdiek 19, D-28357 Bremen
>> http://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>>
>> > -----Original Message-----
>> > From: Rob Audenaerde [mailto:rob.audenaerde@gmail.com]
>> > Sent: Monday, January 29, 2018 11:29 AM
>> > To: java-user@lucene.apache.org
>> > Subject: Re: indexing performance 6.6 vs 7.1
>> >
>> > Hi all,
>> >
>> > Some follow up (sorry for the delay).
>> >
>> > We built a benchmark in our application, and profiled it (on a smallish
>> > data set). What we currently see in the profiler is that in Lucene 7.1
>> the
>> > calls to `commit()` take much longer.
>> >
>> > The self-time committing in 6.6: 3,215 ms
>> > The self-time committing in 7.1: 10,187 ms.
>> >
>> > We will try to run a larger data set and also later with the IW info
>> > stream.
>> >
>> > -Rob
>> >
>> > On Thu, Jan 18, 2018 at 7:03 PM, Erick Erickson <
>> erickerickson@gmail.com>
>> > wrote:
>> >
>> > > Robert:
>> > >
>> > > Ah, right. I keep confusing my gmail lists
>> > > "lucene dev"
>> > > and
>> > > "lucene list"....
>> > >
>> > > Siiigggghhhhh.
>> > >
>> > >
>> > >
>> > > On Thu, Jan 18, 2018 at 9:18 AM, Adrien Grand <jpountz@gmail.com>
>> > wrote:
>> > > > If you have sparse data, I would have expected index time to
>> *decrease*,
>> > > > not increase.
>> > > >
>> > > > Can you enable the IW info stream and share flush + merge times to
>> see
>> > > > where indexing time goes?
>> > > >
>> > > > If you can run with a profiler, this might also give useful
>> information.
>> > > >
>> > > > Le jeu. 18 janv. 2018 à 11:23, Rob Audenaerde
>> > <rob.audenaerde@gmail.com>
>> > > a
>> > > > écrit :
>> > > >
>> > > >> Hi all,
>> > > >>
>> > > >> We recently upgraded from Lucene 6.6 to 7.1.  We see a significant
>> drop
>> > > in
>> > > >> indexing performace.
>> > > >>
>> > > >> We have a-typical use of Lucene, as we (also) index some database
>> > tables
>> > > >> and add all the values as AssociatedFacetFields as well. This
>> allows us
>> > > to
>> > > >> create pivot tables on search results really fast.
>> > > >>
>> > > >> These tables have some overlapping columns, but also disjoint
ones.
>> > > >>
>> > > >> We anticipated a decrease in index size because of the sparse
>> > > docvalues. We
>> > > >> see this happening, with decreases to ~50%-80% of the original
>> index
>> > > size.
>> > > >> But we did not expect an drop in indexing performance (client
>> systems
>> > > >> indexing time increased with +50% to +250%).
>> > > >>
>> > > >> (Our indexing-speed used to be mainly bound by the speed the
>> > Taxonomy
>> > > could
>> > > >> deliver new ordinals for new values, currently we are
>> investigating if
>> > > this
>> > > >> is still the case, will report later when a profiler run has been
>> done)
>> > > >>
>> > > >> Does anyone know if this increase in indexing time is to be
>> expected as
>> > > >> result of the sparse docvalues change?
>> > > >>
>> > > >> Kind regards,
>> > > >>
>> > > >> Rob Audenaerde
>> > > >>
>> > >
>> > > ---------------------------------------------------------------------
>> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > > For additional commands, e-mail: java-user-help@lucene.apache.org
>> > >
>> > >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message