lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wu,Yunfeng" <wuyunfen...@baidu.com>
Subject Re: Impact and WAND
Date Thu, 11 Jul 2019 03:21:10 GMT

@Adrien Grand <jpountz@gmail.com<mailto:jpountz@gmail.com>>. Thanks for your reply.

The explanation ` skip low-scoring matches` is great,  I  looked up some docs and inspect
some related code.

I noticed the ` block-max WAND` mode only work when  ScoreMode.TOP_SCORES is used,   is right?
 (The basic TermQuery would generate ImpactDISI with scoreMode is TOP_SCORES.)

Lucene compute max score per block and then cached in `MaxScoreCache` , this means we can
skip low-scoring block( current one block 128 DocIds)  and in competitive block  still need
to score any docId as seen,   I confused with  `MaxScoreCache#getMaxScoreForLevel(int level)`,
what the level mean? Skip level?  (Somewhere invoke this method pass one Integer upTo param)

Thanks Lucene Team


在 2019年7月10日,下午10:52,Adrien Grand <jpountz@gmail.com<mailto:jpountz@gmail.com>>
写道:

To clarify, the scoring process is not accelerated because we
terminate early but because we can skip low-scoring matches (there
might be competitive hits at the very end of the index).

CompetitiveImpactAccumulator is indeed related to WAND. It helps store
the maximum score impacts per block of documents in postings lists.
Then this information is leveraged by block-max WAND in order to skip
low-scoring blocks.

This does indeed help avoid reading norms, but also document IDs and
term frequencies.

On Wed, Jul 10, 2019 at 4:10 PM Wu,Yunfeng <wuyunfeng01@baidu.com<mailto:wuyunfeng01@baidu.com>>
wrote:

Hi,

We discuss some topic from https://github.com/apache/lucene-solr/pull/595. As Atri Sharma
propose discuss with the java dev list.


Impact `frequency ` and `norm ` just to accelerate the `score process`  which  `terminate
early`.

In impact mode, `CompetitiveImpactAccumulator` will record (freq, norm) pair , would stored
at index level. Also I noted `CompetitiveImpactAccumulator` commented with `This class accumulates
the (freq, norm) pairs that may produce competitive scores`,  maybe related to `WAND`?


The norm value which produced or consumed by `Lucene80NormsFormat`.

In this ` Impact way`, we can avoid read norms from `Lucene80NormsProducer` that may generate
the extra IO?  ( the norm value Lucene stored twice.)and take full advantage of the WAND
method?



--
Adrien

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message