lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dennis Gove (JIRA)" <j...@apache.org>
Subject [jira] [Closed] (SOLR-8188) Add hash style joins to the Streaming API and Streaming Expressions
Date Thu, 12 Nov 2015 00:23:11 GMT

     [ https://issues.apache.org/jira/browse/SOLR-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Dennis Gove closed SOLR-8188.
-----------------------------
    Resolution: Implemented

Still closed

> Add hash style joins to the Streaming API and Streaming Expressions
> -------------------------------------------------------------------
>
>                 Key: SOLR-8188
>                 URL: https://issues.apache.org/jira/browse/SOLR-8188
>             Project: Solr
>          Issue Type: Improvement
>          Components: SolrJ
>            Reporter: Dennis Gove
>            Assignee: Dennis Gove
>            Priority: Minor
>             Fix For: Trunk
>
>         Attachments: SOLR-8188.patch, SOLR-8188.patch, SOLR-8188.patch
>
>
> Add HashJoinStream and OuterHashJoinStream to the Streaming API to allow for optimized
joining between sub-streams.
> HashJoinStream is similar to an InnerJoinStream except that it does not insist on any
particular order and will read all values from the stream being hashed (hashStream) when open()
is called. During read() it will return the next tuple from the stream not being hashed (fullStream)
which has at least one matching record in hashStream. It will return a tuple which is the
merge of both tuples. If the tuple from the fullStream matches with more than one tuple from
the hashStream then calling read() will return the merge with the next matching tuple. The
order of the resulting stream is the order of the fullStream.
> OuterHashJoinStream is similar to a HashJoinStream and LeftOuterJoinStream in that a
tuple from fullStream will be returned even if it doesn't have a matching record in hashStream.
All other pieces are identical.
> In expression form
> {code}
> hashJoin(
>   search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...),
>   hashed=search(collection2, q=*:*, fl="fieldA, fieldB, fieldE", ...),
>   on="fieldA, fieldB"
> )
> {code}
> {code}
> outerHashJoin(
>   search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...),
>   hashed=search(collection2, q=*:*, fl="fieldA, fieldB, fieldE", ...),
>   on="fieldA, fieldB"
> )
> {code}
> As you can see the hashStream is named parameter which makes it very clear which stream
should be hashed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message