lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrien Grand <jpou...@gmail.com>
Subject Re: benchmark drop for PrimaryKey
Date Thu, 23 Aug 2018 21:15:32 GMT
The commit that caused this slowdown might be
https://github.com/mikemccand/luceneutil/commit/1d8460f342f269c98047def9f9eb76213acae5d9
.

We don't have anything that performs as well anymore indeed, but I'm not
sure this is a big deal. I would suspect that there were not many users of
that postings format, one reason being that it was not supported in terms
of backward compatibility (like any codec but the default one) and another
reason being that it used a lot of RAM. In a number of cases, we try to
fold benefits of alternative codecs in the default codec, for instance we
used to have a "pulsing" postings format that could record postings in the
terms dictionary in order to save one disk seek, and we ended up folding
this feature into the default postings format by only enabling it on terms
that have a document frequency of 1 and index_options=DOCS_ONLY, so that it
would be always used with primary keys. For that postings format, it didn't
really make sense as the way that it managed to be so much faster was by
loading much more information in RAM, which we don't want to do with the
default codec.

Le jeu. 23 août 2018 à 22:40, Michael Sokolov <msokolov@gmail.com> a écrit :

> I happened to stumble across this chart
> https://home.apache.org/~mikemccand/lucenebench/PKLookup.html showing a
> pretty drastic drop in this benchmark on 5/13. I looked at the commits
> between the previous run and this one and did some investigation, trying to
> do some git bisect to find the problem using benchmarks as a test, but it
> proved to be quite difficult due to a breaking change re: MemoryCodec that
> also required corresponding changes in  benchmark code.
>
> In the end, I think removing MemoryCodec is what caused the drop in perf
> here, based on this comment in benchmark code:
>
> '2011-06-26'
>    Switched to MemoryCodec for the primary-key 'id' field so that lookups
> (either for PKLookup test or for deletions during reopen in the NRT test)
> are fast, with no IO.  Also switched to NRTCachingDirectory for the NRT
> test, so that small new segments are written only in RAM.
>
> I don't really understand the implications here beyond benchmarks, but it
> does seem that perhaps some essential high-performing capability has been
> lost?  Is there some equivalent thing remaining after MemoryCodec's removal
> that can be used for primary keys?
>
> -Mike
>

Mime
View raw message