lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Martijn van Groningen (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-3602) Add join query to Lucene
Date Mon, 12 Dec 2011 16:13:30 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167571#comment-13167571
] 

Martijn van Groningen commented on LUCENE-3602:
-----------------------------------------------

bq. Maybe rename actualQuery to fromQuery?
Yes, fromQuery makes more sense than actualQuery.

{quote}
Why preComputedFromDocs...? Like if you were to cache something,
wouldn't you want cache the toSearcher's bitset instead?
{quote}
This is in the case if your from query was cached and your toSearch's
bitset isn't, which is a likely scenario.
But caching the toSearcher's bitset is better off course when
possible. But this should be happen outside the JoinQuery, right?

bq. Maybe rename JoinQueryWeight.joinResult to topLevelJoinResult,
I agree a much more descriptive name.

{quote}
I wonder if we could make this a Filter instead, somehow? Ie, at
its core it converts a top-level bitset in the fromSearcher doc
space into the joined bitset in the toSearcher doc space. It
could even maybe just be a static method taking in fromBitset and
returning toBitset, which could operate per-segment on the
toSearcher side? (Separately: I wonder if JoinQuery should do
something with the scores of the fromQuery....? Not right now but
maybe later...).
{quote}
It just matches docs from one side to the to side. That is all... So static method / filter
should be able to do the job.
I'm not sure, but if it is a query it might be able to one day encapsulate the joining in
the Lucene query language?

bq. Maybe reword that to state that all joined to/from docs must reside in the same shard?
+1

{quote}
I wonder if we could DocTermOrds instead? (Or,
FieldCache.DocTermsIndex or DocValues.BYTES_*, if we know
fromSearcher.fromField is
single-valued). This way we uninvert once (on init), and then doing
the join should be much faster since for each fromDocID we can lookup
the term(s) to join on.
{quote}
I really like that idea! This already crossed my mind a few days ago
as an improvement to speedup the joining. Would be nice if the user can 
choose between a more ram but faster variant and a less ram but slower variant.
I think we can just make two concrete JoinQuery impl that both have a different
joinResult(...) impl.
                
> Add join query to Lucene
> ------------------------
>
>                 Key: LUCENE-3602
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3602
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: modules/join
>            Reporter: Martijn van Groningen
>         Attachments: LUCENE-3602.patch, LUCENE-3602.patch
>
>
> Solr has (psuedo) join query for a while now. I think this should also be available in
Lucene.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message