lucene-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joel Bernstein (Jira)" <j...@apache.org>
Subject [jira] [Comment Edited] (SOLR-13892) Add postfilter support to {!join} queries
Date Tue, 05 Nov 2019 21:24:00 GMT

    [ https://issues.apache.org/jira/browse/SOLR-13892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16967880#comment-16967880
] 

Joel Bernstein edited comment on SOLR-13892 at 11/5/19 9:23 PM:
----------------------------------------------------------------

I read through *JoinUtil* and:
{code:java}
package org.apache.lucene.search.join;

import java.io.IOException;
import java.util.Objects;

import org.apache.lucene.index.Terms;
import org.apache.lucene.index.TermsEnum;
import org.apache.lucene.search.MultiTermQuery;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.QueryVisitor;
import org.apache.lucene.util.Accountable;
import org.apache.lucene.util.AttributeSource;
import org.apache.lucene.util.BytesRefHash;
import org.apache.lucene.util.RamUsageEstimator;

/**
 * A query that has an array of terms from a specific field. This query will match documents
have one or more terms in
 * the specified field that match with the terms specified in the array.
 *
 * @lucene.experimental
 */
class TermsQuery extends MultiTermQuery implements Accountable {
 {code}
 

The join in this patch uses a very different strategy than JoinUtil / TermsQuery.

The postfilter in this patch is not the main optimization. The main optimization is in how
the terms are collected from the from index and matched with the two index. Take a close look
at the implementation of how terms are collected/sorted/matched between the two indexes. Notice
the terms are sorted without sorting. Then the sorted bytesrefs are merge to the sorted list
on the *to* side, by shortening the search space on the *to* side with each binary search
into the DocValues term index. This is approach is extremely fast.

 

 

 

 


was (Author: joel.bernstein):
I read through *JoinUtil* and:
{code:java}
package org.apache.lucene.search.join;

import java.io.IOException;
import java.util.Objects;

import org.apache.lucene.index.Terms;
import org.apache.lucene.index.TermsEnum;
import org.apache.lucene.search.MultiTermQuery;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.QueryVisitor;
import org.apache.lucene.util.Accountable;
import org.apache.lucene.util.AttributeSource;
import org.apache.lucene.util.BytesRefHash;
import org.apache.lucene.util.RamUsageEstimator;

/**
 * A query that has an array of terms from a specific field. This query will match documents
have one or more terms in
 * the specified field that match with the terms specified in the array.
 *
 * @lucene.experimental
 */
class TermsQuery extends MultiTermQuery implements Accountable {
 {code}
 

The join in this patch uses a very different strategy than JoinUtil / TermsQuery.

The postfilter in this patch is not the main optimization. The main optimization is in how
the terms are collected from the from index and matched with the two index. Take a close look
at the implementation of how terms are collected/sorted/matched between the two indexes. Notice
the terms are sorted without sorting. Then the sort is used to merge to sorted list, by shortening
the search space on the *to* side with each binary search into the DocValues term index. This
is approach is extremely fast.

 

 

 

 

> Add postfilter support to {!join} queries
> -----------------------------------------
>
>                 Key: SOLR-13892
>                 URL: https://issues.apache.org/jira/browse/SOLR-13892
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: query parsers
>    Affects Versions: master (9.0)
>            Reporter: Jason Gerlowski
>            Priority: Major
>         Attachments: SOLR-13892.patch
>
>
> The JoinQParserPlugin would be a lot performant in many use-cases if it could operate
as a post-filter, especially when doc-values for the involved fields are available.
> With this issue, I'd like to propose a post-filter implementation for the {{join}} qparser.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Mime
View raw message