lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Weiss <Steve.We...@wgsn.com>
Subject Re: How to use BitDocSet within a PostFilter
Date Mon, 03 Aug 2015 16:52:25 GMT
Yes that was it.  Had no idea this was an issue!

On Monday, August 3, 2015, Roman Chyla <roman.chyla@gmail.com<mailto:roman.chyla@gmail.com>>
wrote:
Hi,
inStockSkusBitSet.get(currentChildDocNumber)

Is that child a lucene id? If yes, does it include offset? Every index
segment starts at a different point, but docs are numbered from zero. So to
check them against the full index bitset, I'd be doing
Bitset.exists(indexBase + docid)

Just one thing to check

Roman
On Aug 3, 2015 1:24 AM, "Stephen Weiss" <Steve.Weiss@wgsn.com<javascript:;>> wrote:

> Hi everyone,
>
> I'm trying to write a PostFilter for Solr 5.1.0, which is meant to crawl
> through grandchild documents during a search through the parents and filter
> out documents based on statistics gathered from aggregating the
> grandchildren together.  I've been successful in getting the logic correct,
> but it does not perform so well - I'm grabbing too many documents from the
> index along the way.  I'm trying to filter out grandchild documents which
> are not relevant to the statistics I'm collecting, in order to reduce the
> number of document objects pulled from the IndexReader.
>
> I've implemented the following code in my DelegatingCollector.collect:
>
> if (inStockSkusBitSet == null) {
> SolrIndexSearcher SidxS = (SolrIndexSearcher) idxS; // type cast from
> IndexSearcher to expose getDocSet.
> inStockSkusDocSet = SidxS.getDocSet(inStockSkusQuery);
> inStockSkusBitDocSet = (BitDocSet) inStockSkusDocSet; // type cast from
> DocSet to expose getBits.
> inStockSkusBitSet = inStockSkusBitDocSet.getBits();
> }
>
>
> My BitDocSet reports a size which matches a standard query for the more
> limited set of grandchildren, and the FixedBitSet (inStockSkusBitSet) also
> reports this same cardinality.  Based on that fact, it seems that the
> getDocSet call itself must be working properly, and returning the right
> number of documents.  However, when I try to filter out grandchild
> documents using either BitDocSet.exists or BitSet.get (passing over any
> grandchild document which doesn't exist in the bitdocset or return true
> from the bitset), I get about 1/3 less results than I'm supposed to.   It
> seems many documents that should match the filter, are being excluded, and
> documents which should not match the filter, are being included.
>
> I'm trying to use it either of these ways:
>
> if (!inStockSkusBitSet.get(currentChildDocNumber)) continue;
> if (!inStockSkusBitDocSet.exists(currentChildDocNumber)) continue;
>
> The currentChildDocNumber is simply the docNumber which is passed to
> DelegatingCollector.collect, decremented until I hit a document that
> doesn't belong to the parent document.
>
> I can't seem to figure out a way to actually use the BitDocSet (or its
> derivatives) to quickly eliminate document IDs.  It seems like this is how
> it's supposed to be used.  What am I getting wrong?
>
> Sorry if this is a newbie question, I've never written a PostFilter
> before, and frankly, the documentation out there is a little sketchy
> (mostly for version 4) - so many classes have changed names and so many of
> the more well-documented techniques are deprecated or removed now, it's
> tough to follow what the current best practice actually is.  I'm using the
> block join functionality heavily so I'm trying to keep more current than
> that.  I would be happy to send along the full source privately if it would
> help figure this out, and plan to write up some more elaborate instructions
> (updated for Solr 5) for the next person who decides to write a PostFilter
> and work with block joins, if I ever manage to get this performing well
> enough.
>
> Thanks for any pointers!  Totally open to doing this an entirely different
> way.  I read DocValues might be a more elegant approach but currently that
> would require reindexing, so trying to avoid that.
>
> Also, I've been wondering if the query above would read from the filter
> cache or not.  The query is constructed like this:
>
>
>     private Term inStockTrueTerm = new Term("sku_history.is_in_stock",
> "T");
>     private Term objectTypeSkuHistoryTerm = new Term("object_type",
> "sku_history");
> ...
>
> inStockTrueTermQuery = new TermQuery(inStockTrueTerm);
> objectTypeSkuHistoryTermQuery = new TermQuery(objectTypeSkuHistoryTerm);
> inStockSkusQuery = new BooleanQuery();
> inStockSkusQuery.add(inStockTrueTermQuery, BooleanClause.Occur.MUST);
> inStockSkusQuery.add(objectTypeSkuHistoryTermQuery,
> BooleanClause.Occur.MUST);
> --
> Steve
>
> ________________________________
>
> WGSN is a global foresight business. Our experts provide deep insight and
> analysis of consumer, fashion and design trends. We inspire our clients to
> plan and trade their range with unparalleled confidence and accuracy.
> Together, we Create Tomorrow.
>
> WGSN<http://www.wgsn.com/> is part of WGSN Limited, comprising of
> market-leading products including WGSN.com<http://www.wgsn.com>, WGSN
> Lifestyle & Interiors<http://www.wgsn.com/en/lifestyle-interiors>, WGSN
> INstock<http://www.wgsninstock.com/>, WGSN StyleTrial<
> http://www.wgsn.com/en/styletrial/> and WGSN Mindset<
> http://www.wgsn.com/en/services/consultancy/>, our bespoke consultancy
> services.
>
> The information in or attached to this email is confidential and may be
> legally privileged. If you are not the intended recipient of this message,
> any use, disclosure, copying, distribution or any action taken in reliance
> on it is prohibited and may be unlawful. If you have received this message
> in error, please notify the sender immediately by return email and delete
> this message and any copies from your computer and network. WGSN does not
> warrant that this email and any attachments are free from viruses and
> accepts no liability for any loss resulting from infected email
> transmissions.
>
> WGSN reserves the right to monitor all email through its networks. Any
> views expressed may be those of the originator and not necessarily of WGSN.
> WGSN is powered by Top Right Group<http://www.topright-group.com>, which
> transforms knowledge businesses to deliver exceptional performance.
>
> Please be advised all phone calls may be recorded for training and quality
> purposes and by accepting and/or making calls from and/or to us you
> acknowledge and agree to calls being recorded.
>
> WGSN Limited, Company number 4858491
>
> registered address:
>
> Top Right Group Limited, The Prow, 1 Wilder Walk, London W1B 5AP
>
> WGSN Inc., tax ID 04-3851246, registered office c/o National Registered
> Agents, Inc., 160 Greentree Drive, Suite 101, Dover DE 19904, United States
>
> 4C Serviços de Informação Ltda., CNPJ/MF (Taxpayer's Register):
> 15.536.968/0001-04, Address: Avenida Nove de Julho, 5966, Loja, CEP
> 01406-200, Jardim Europa, São Paulo
>
> 4C Business Information Consulting (Shanghai) Co., Ltd, 富新商务信息咨询(上海)有限公司,
> registered address Unit 4810/4811, 48/F Tower 1, Grand Gateway, 1 Hong Qiao
> Road, Xuhui District, Shanghai
>

________________________________

WGSN is a global foresight business. Our experts provide deep insight and analysis of consumer,
fashion and design trends. We inspire our clients to plan and trade their range with unparalleled
confidence and accuracy. Together, we Create Tomorrow.

WGSN<http://www.wgsn.com/> is part of WGSN Limited, comprising of market-leading products
including WGSN.com<http://www.wgsn.com>, WGSN Lifestyle & Interiors<http://www.wgsn.com/en/lifestyle-interiors>,
WGSN INstock<http://www.wgsninstock.com/>, WGSN StyleTrial<http://www.wgsn.com/en/styletrial/>
and WGSN Mindset<http://www.wgsn.com/en/services/consultancy/>, our bespoke consultancy
services.

The information in or attached to this email is confidential and may be legally privileged.
If you are not the intended recipient of this message, any use, disclosure, copying, distribution
or any action taken in reliance on it is prohibited and may be unlawful. If you have received
this message in error, please notify the sender immediately by return email and delete this
message and any copies from your computer and network. WGSN does not warrant that this email
and any attachments are free from viruses and accepts no liability for any loss resulting
from infected email transmissions.

WGSN reserves the right to monitor all email through its networks. Any views expressed may
be those of the originator and not necessarily of WGSN. WGSN is powered by Top Right Group<http://www.topright-group.com>,
which transforms knowledge businesses to deliver exceptional performance.

Please be advised all phone calls may be recorded for training and quality purposes and by
accepting and/or making calls from and/or to us you acknowledge and agree to calls being recorded.

WGSN Limited, Company number 4858491

registered address:

Top Right Group Limited, The Prow, 1 Wilder Walk, London W1B 5AP

WGSN Inc., tax ID 04-3851246, registered office c/o National Registered Agents, Inc., 160
Greentree Drive, Suite 101, Dover DE 19904, United States

4C Serviços de Informação Ltda., CNPJ/MF (Taxpayer's Register): 15.536.968/0001-04, Address:
Avenida Nove de Julho, 5966, Loja, CEP 01406-200, Jardim Europa, São Paulo

4C Business Information Consulting (Shanghai) Co., Ltd, 富新商务信息咨询(上海)有限公司,
registered address Unit 4810/4811, 48/F Tower 1, Grand Gateway, 1 Hong Qiao Road, Xuhui District,
Shanghai

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message