lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zheng Lin Edwin Yeo <edwinye...@gmail.com>
Subject Re: Joining more than 2 collections
Date Fri, 05 May 2017 16:16:30 GMT
Thanks for the explanation.

Yes, all my join keys are the same, so I think both should be ok too.

All my 3 collections have a lot of records, but for my last collection, I'm
only extracting a few of the fields (about 5) to be shown.

So does this considered that I have three very large joins?

Regards,
Edwin



On 5 May 2017 at 23:37, Joel Bernstein <joelsolr@gmail.com> wrote:

> *:* queries will work fine for the innerJoin, which is a merge join that
> never runs out of memory. The hashJoin read the entire "hashed" query into
> memory though, so there are limitations.
>
> So if you have three very large joins that require *:* then the hashJoin
> approach will be problematic. In that case you could use fetch() around the
> innerJoin to do the third join.
>
> parallel(fetch(innerJoin(search(), search())))
>
> Or if the hashJoin uses the same key as the innerJoin you can do the
> hashJoin in parallel as well and partition the "hashed" search across the
> workers:
>
> parallel(hashJoin(innerJoin(search(), search()), hashed=search())))
>
> In this case the "hashed" search partitionKeys would be the same as the
> innerJoin searches. But the join keys must be same for this scenario to
> work.
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Fri, May 5, 2017 at 11:17 AM, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com
> >
> wrote:
>
> > I found that using *:* will return the entire resultset, and cause the
> > result from the join query to blow up.
> >
> > Like if from the query, there are 2 results in collection1, and 3 results
> > in collection2, I found that there could be 6 results that will be
> returned
> > in the join query (using hashJoin or innerJoin).
> >
> > Is that correct?
> >
> > Regards,
> > Edwin
> >
> >
> > On 5 May 2017 at 07:17, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com>
> wrote:
> >
> > > Hi Joel,
> > >
> > > Yes, the /export works after I remove the /export handler from
> > > solrconfig.xml. Thanks for the advice.
> > >
> > > For *:*, there will be result returned when using /export.
> > > But if one of the queries is *:*, this means the entire resultset will
> > > contains all the records from the query which has *:*?
> > >
> > > Regards,
> > > Edwin
> > >
> > >
> > > On 5 May 2017 at 01:46, Joel Bernstein <joelsolr@gmail.com> wrote:
> > >
> > >> No *:* will simply return all the results from one of the queries. It
> > >> should still join properly. If you are using the /select handler joins
> > >> will
> > >> not work properly.
> > >>
> > >>
> > >> This example worked properly for me:
> > >>
> > >> hashJoin(parallel(collection2, j
> > >>                             workers=3,
> > >>                             sort="id asc",
> > >>                             innerJoin(search(collection2, q="*:*",
> > >> fl="id",
> > >> sort="id asc", qt="/export", partitionKeys="id"),
> > >>                                             search(collection2,
> > >> q="year_i:42", fl="id, year_i", sort="id asc", qt="/export",
> > >> partitionKeys="id"),
> > >>                                             on="id")),
> > >>                 hashed=search(collection2, q="day_i:7", fl="id,
> day_i",
> > >> sort="id asc", qt="/export"),
> > >>                 on="id")
> > >>
> > >>
> > >>
> > >>
> > >> Joel Bernstein
> > >> http://joelsolr.blogspot.com/
> > >>
> > >> On Thu, May 4, 2017 at 12:28 PM, Zheng Lin Edwin Yeo <
> > >> edwinyeozl@gmail.com>
> > >> wrote:
> > >>
> > >> > Hi Joel,
> > >> >
> > >> > For the join queries, is it true that if we use q=*:* for the query
> > for
> > >> one
> > >> > of the join, there will not be any results return?
> > >> >
> > >> > Currently I found this is the case, if I just put q=*:*.
> > >> >
> > >> > Regards,
> > >> > Edwin
> > >> >
> > >> >
> > >> > On 4 May 2017 at 23:38, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com>
> > >> wrote:
> > >> >
> > >> > > Hi Joel,
> > >> > >
> > >> > > I think that might be one of the reason.
> > >> > > This is what I have for the /export handler in my solrconfig.xml
> > >> > >
> > >> > > <requestHandler name="/export" class="solr.SearchHandler"> <lst
> > name=
> > >> > > "invariants"> <str name="rq">{!xport}</str> <str
> > >> name="wt">xsort</str> <
> > >> > > str name="distrib">false</str> </lst> <arr name="components">
> > >> > <str>query</
> > >> > > str> </arr> </requestHandler>
> > >> > >
> > >> > > This is the error message that I get when I use the /export
> handler.
> > >> > >
> > >> > > java.io.IOException: java.util.concurrent.ExecutionException:
> > >> > > java.io.IOException: --> http://localhost:8983/solr/
> > >> > > collection1_shard1_replica1/: An exception has occurred on the
> > server,
> > >> > > refer to server log for details.
> > >> > > at org.apache.solr.client.solrj.io.stream.CloudSolrStream.
> > >> > > openStreams(CloudSolrStream.java:451)
> > >> > > at org.apache.solr.client.solrj.io.stream.CloudSolrStream.
> > >> > > open(CloudSolrStream.java:308)
> > >> > > at org.apache.solr.client.solrj.io.stream.PushBackStream.open(
> > >> > > PushBackStream.java:70)
> > >> > > at org.apache.solr.client.solrj.io.stream.JoinStream.open(
> > >> > > JoinStream.java:147)
> > >> > > at org.apache.solr.client.solrj.io.stream.ExceptionStream.
> > >> > > open(ExceptionStream.java:51)
> > >> > > at org.apache.solr.handler.StreamHandler$TimerStream.
> > >> > > open(StreamHandler.java:457)
> > >> > > at org.apache.solr.client.solrj.io.stream.TupleStream.
> > >> > > writeMap(TupleStream.java:63)
> > >> > > at org.apache.solr.response.JSONWriter.writeMap(
> > >> > > JSONResponseWriter.java:547)
> > >> > > at org.apache.solr.response.TextResponseWriter.writeVal(
> > >> > > TextResponseWriter.java:193)
> > >> > > at org.apache.solr.response.JSONWriter.
> writeNamedListAsMapWithDups(
> > >> > > JSONResponseWriter.java:209)
> > >> > > at org.apache.solr.response.JSONWriter.writeNamedList(
> > >> > > JSONResponseWriter.java:325)
> > >> > > at org.apache.solr.response.JSONWriter.writeResponse(
> > >> > > JSONResponseWriter.java:120)
> > >> > > at org.apache.solr.response.JSONResponseWriter.write(
> > >> > > JSONResponseWriter.java:71)
> > >> > > at org.apache.solr.response.QueryResponseWriterUtil.writeQueryR
> > >> esponse(
> > >> > > QueryResponseWriterUtil.java:65)
> > >> > > at org.apache.solr.servlet.HttpSolrCall.writeResponse(
> > >> > > HttpSolrCall.java:732)
> > >> > > at org.apache.solr.servlet.HttpSolrCall.call(
> HttpSolrCall.java:473)
> > >> > > at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> > >> > > SolrDispatchFilter.java:345)
> > >> > > at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> > >> > > SolrDispatchFilter.java:296)
> > >> > > at org.eclipse.jetty.servlet.ServletHandler$CachedChain.
> > >> > > doFilter(ServletHandler.java:1691)
> > >> > > at org.eclipse.jetty.servlet.ServletHandler.doHandle(
> > >> > > ServletHandler.java:582)
> > >> > > at org.eclipse.jetty.server.handler.ScopedHandler.handle(
> > >> > > ScopedHandler.java:143)
> > >> > > at org.eclipse.jetty.security.SecurityHandler.handle(
> > >> > > SecurityHandler.java:548)
> > >> > > at org.eclipse.jetty.server.session.SessionHandler.
> > >> > > doHandle(SessionHandler.java:226)
> > >> > > at org.eclipse.jetty.server.handler.ContextHandler.
> > >> > > doHandle(ContextHandler.java:1180)
> > >> > > at org.eclipse.jetty.servlet.ServletHandler.doScope(
> > >> > > ServletHandler.java:512)
> > >> > > at org.eclipse.jetty.server.session.SessionHandler.
> > >> > > doScope(SessionHandler.java:185)
> > >> > > at org.eclipse.jetty.server.handler.ContextHandler.
> > >> > > doScope(ContextHandler.java:1112)
> > >> > > at org.eclipse.jetty.server.handler.ScopedHandler.handle(
> > >> > > ScopedHandler.java:141)
> > >> > > at org.eclipse.jetty.server.handler.ContextHandlerCollection.
> > handle(
> > >> > > ContextHandlerCollection.java:213)
> > >> > > at org.eclipse.jetty.server.handler.HandlerCollection.
> > >> > > handle(HandlerCollection.java:119)
> > >> > > at org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> > >> > > HandlerWrapper.java:134)
> > >> > > at org.eclipse.jetty.server.Server.handle(Server.java:534)
> > >> > > at org.eclipse.jetty.server.HttpChannel.handle(
> > HttpChannel.java:320)
> > >> > > at org.eclipse.jetty.server.HttpConnection.onFillable(
> > >> > > HttpConnection.java:251)
> > >> > > at org.eclipse.jetty.io.AbstractConnection$
> ReadCallback.succeeded(
> > >> > > AbstractConnection.java:273)
> > >> > > at org.eclipse.jetty.io.FillInterest.fillable(
> FillInterest.java:95)
> > >> > > at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(
> > >> > > SelectChannelEndPoint.java:93)
> > >> > > at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
> > >> > > executeProduceConsume(ExecuteProduceConsume.java:303)
> > >> > > at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
> > >> > > produceConsume(ExecuteProduceConsume.java:148)
> > >> > > at org.eclipse.jetty.util.thread.strategy.
> > ExecuteProduceConsume.run(
> > >> > > ExecuteProduceConsume.java:136)
> > >> > > at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(
> > >> > > QueuedThreadPool.java:671)
> > >> > > at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(
> > >> > > QueuedThreadPool.java:589)
> > >> > > at java.lang.Thread.run(Thread.java:745)
> > >> > > Caused by: java.util.concurrent.ExecutionException:
> > >> java.io.IOException:
> > >> > > --> http://localhost:8983/solr/collection1_shard1_replica1/: An
> > >> > exception
> > >> > > has occurred on the server, refer to server log for details.
> > >> > > at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> > >> > > at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> > >> > > at org.apache.solr.client.solrj.io.stream.CloudSolrStream.
> > >> > > openStreams(CloudSolrStream.java:445)
> > >> > > ... 42 more
> > >> > > Caused by: java.io.IOException: --> http://localhost:8983/solr/
> > >> > > collection1_shard1_replica1/: An exception has occurred on the
> > server,
> > >> > > refer to server log for details.
> > >> > > at org.apache.solr.client.solrj.io.stream.SolrStream.read(
> > >> > > SolrStream.java:238)
> > >> > > at org.apache.solr.client.solrj.io.stream.CloudSolrStream$
> > >> > > TupleWrapper.next(CloudSolrStream.java:541)
> > >> > > at org.apache.solr.client.solrj.io.stream.CloudSolrStream$
> > >> > > StreamOpener.call(CloudSolrStream.java:564)
> > >> > > at org.apache.solr.client.solrj.io.stream.CloudSolrStream$
> > >> > > StreamOpener.call(CloudSolrStream.java:551)
> > >> > > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> > >> > > at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolE
> > >> xecutor.
> > >> > > lambda$execute$0(ExecutorUtil.java:229)
> > >> > > at java.util.concurrent.ThreadPoolExecutor.runWorker(
> > >> > > ThreadPoolExecutor.java:1142)
> > >> > > at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> > >> > > ThreadPoolExecutor.java:617)
> > >> > > ... 1 more
> > >> > > Caused by: org.noggit.JSONParser$ParseException: JSON Parse
> Error:
> > >> > > char=<,position=0 BEFORE='<' AFTER='?xml version="1.0"
> > >> > encoding="UTF-8"?> <'
> > >> > > at org.noggit.JSONParser.err(JSONParser.java:356)
> > >> > > at org.noggit.JSONParser.handleNonDoubleQuoteString(JSONParser.
> > >> java:712)
> > >> > > at org.noggit.JSONParser.next(JSONParser.java:886)
> > >> > > at org.noggit.JSONParser.nextEvent(JSONParser.java:930)
> > >> > > at org.apache.solr.client.solrj.io.stream.JSONTupleStream.
> > >> > > expect(JSONTupleStream.java:97)
> > >> > > at org.apache.solr.client.solrj.io.stream.JSONTupleStream.
> > >> > > advanceToDocs(JSONTupleStream.java:179)
> > >> > > at org.apache.solr.client.solrj.io.stream.JSONTupleStream.
> > >> > > next(JSONTupleStream.java:77)
> > >> > > at org.apache.solr.client.solrj.io.stream.SolrStream.read(
> > >> > > SolrStream.java:207)
> > >> > > ... 8 more
> > >> > >
> > >> > >
> > >> > > Regards,
> > >> > > Edwin
> > >> > >
> > >> > >
> > >> > > On 4 May 2017 at 22:54, Joel Bernstein <joelsolr@gmail.com>
> wrote:
> > >> > >
> > >> > >> I suspect that there is something not quite right about the how
> the
> > >> > >> /export
> > >> > >> handler is configured. Straight out of the box in solr 6.4.2
> > /export
> > >> > will
> > >> > >> be automatically configured. Are you using a Solr instance that
> has
> > >> been
> > >> > >> upgraded in the past and doesn't have standard 6.4.2 configs?
> > >> > >>
> > >> > >> To really do joins properly you'll have to use the /export
> handler
> > >> > because
> > >> > >> /select will not stream entire result sets (unless they are
> pretty
> > >> > small).
> > >> > >> So your results will be missing data possibly.
> > >> > >>
> > >> > >> I would take a close look at the logs and see what all the
> > exceptions
> > >> > are
> > >> > >> when you run the a search using qt=/export. If you can post all
> the
> > >> > stack
> > >> > >> traces that get generated when you run the search we'll probably
> be
> > >> able
> > >> > >> to
> > >> > >> spot the issue.
> > >> > >>
> > >> > >> About the field ordering. There is support for field ordering in
> > the
> > >> > >> Streaming classes but only a few places actually enforce the
> order.
> > >> The
> > >> > >> 6.5
> > >> > >> SQL interface does keep the fields in order as does the new Tuple
> > >> > >> expression in Solr 6.6. But the expressions you are working with
> > >> > currently
> > >> > >> don't enforce field ordering.
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >> Joel Bernstein
> > >> > >> http://joelsolr.blogspot.com/
> > >> > >>
> > >> > >> On Thu, May 4, 2017 at 2:41 AM, Zheng Lin Edwin Yeo <
> > >> > edwinyeozl@gmail.com
> > >> > >> >
> > >> > >> wrote:
> > >> > >>
> > >> > >> > Hi Joel,
> > >> > >> >
> > >> > >> > I have managed to get the Join to work, but so far it is only
> > >> working
> > >> > >> when
> > >> > >> > I use qt="/select". It is not working when I use qt="/export".
> > >> > >> >
> > >> > >> > For the display of the field, is there a way to allow it to
> list
> > >> them
> > >> > in
> > >> > >> > the order which I want?
> > >> > >> > Currently, the display is quite random, and I can get a field
> in
> > >> > >> > collection1, followed by a field in collection3, then
> collection1
> > >> > again,
> > >> > >> > and then collection2.
> > >> > >> >
> > >> > >> > It will be good if we can arrange the field to display in the
> > order
> > >> > >> that we
> > >> > >> > want.
> > >> > >> >
> > >> > >> > Regards,
> > >> > >> > Edwin
> > >> > >> >
> > >> > >> >
> > >> > >> >
> > >> > >> > On 4 May 2017 at 09:56, Zheng Lin Edwin Yeo <
> > edwinyeozl@gmail.com>
> > >> > >> wrote:
> > >> > >> >
> > >> > >> > > Hi Joel,
> > >> > >> > >
> > >> > >> > > It works when I started off with just one expression.
> > >> > >> > >
> > >> > >> > > Could it be that the data size is too big for export after
> the
> > >> join,
> > >> > >> > which
> > >> > >> > > causes the error?
> > >> > >> > >
> > >> > >> > > Regards,
> > >> > >> > > Edwin
> > >> > >> > >
> > >> > >> > > On 4 May 2017 at 02:53, Joel Bernstein <joelsolr@gmail.com>
> > >> wrote:
> > >> > >> > >
> > >> > >> > >> I was just testing with the query below and it worked for
> me.
> > >> Some
> > >> > of
> > >> > >> > the
> > >> > >> > >> error messages I was getting with the syntax was not what I
> > was
> > >> > >> > expecting
> > >> > >> > >> though, so I'll look into the error handling. But the joins
> do
> > >> work
> > >> > >> when
> > >> > >> > >> the syntax correct. The query below is joining to the same
> > >> > collection
> > >> > >> > >> three
> > >> > >> > >> times, but the mechanics are exactly the same joining three
> > >> > different
> > >> > >> > >> tables. In this example each join narrows down the result
> set.
> > >> > >> > >>
> > >> > >> > >> hashJoin(parallel(collection2,
> > >> > >> > >>                             workers=3,
> > >> > >> > >>                             sort="id asc",
> > >> > >> > >>                             innerJoin(search(collection2,
> > >> q="*:*",
> > >> > >> > >> fl="id",
> > >> > >> > >> sort="id asc", qt="/export", partitionKeys="id"),
> > >> > >> > >>
> >  search(collection2,
> > >> > >> > >> q="year_i:42", fl="id, year_i", sort="id asc", qt="/export",
> > >> > >> > >> partitionKeys="id"),
> > >> > >> > >>                                             on="id")),
> > >> > >> > >>                 hashed=search(collection2, q="day_i:7",
> > fl="id,
> > >> > >> day_i",
> > >> > >> > >> sort="id asc", qt="/export"),
> > >> > >> > >>                 on="id")
> > >> > >> > >>
> > >> > >> > >> Joel Bernstein
> > >> > >> > >> http://joelsolr.blogspot.com/
> > >> > >> > >>
> > >> > >> > >> On Wed, May 3, 2017 at 1:29 PM, Joel Bernstein <
> > >> joelsolr@gmail.com
> > >> > >
> > >> > >> > >> wrote:
> > >> > >> > >>
> > >> > >> > >> > Start off with just this expression:
> > >> > >> > >> >
> > >> > >> > >> > search(collection2,
> > >> > >> > >> >             q=*:*,
> > >> > >> > >> >             fl="a_s,b_s,c_s,d_s,e_s",
> > >> > >> > >> >             sort="a_s asc",
> > >> > >> > >> >             qt="/export")
> > >> > >> > >> >
> > >> > >> > >> > And then check the logs for exceptions.
> > >> > >> > >> >
> > >> > >> > >> > Joel Bernstein
> > >> > >> > >> > http://joelsolr.blogspot.com/
> > >> > >> > >> >
> > >> > >> > >> > On Wed, May 3, 2017 at 12:35 PM, Zheng Lin Edwin Yeo <
> > >> > >> > >> edwinyeozl@gmail.com
> > >> > >> > >> > > wrote:
> > >> > >> > >> >
> > >> > >> > >> >> Hi Joel,
> > >> > >> > >> >>
> > >> > >> > >> >> I am getting this error after I change add qt=/export and
> > >> > removed
> > >> > >> the
> > >> > >> > >> rows
> > >> > >> > >> >> param. Do you know what could be the reason?
> > >> > >> > >> >>
> > >> > >> > >> >> {
> > >> > >> > >> >>   "error":{
> > >> > >> > >> >>     "metadata":[
> > >> > >> > >> >>       "error-class","org.apache.
> > solr.common.SolrException",
> > >> > >> > >> >>       "root-error-class","org.apache.http.
> > >> > MalformedChunkCodingExc
> > >> > >> e
> > >> > >> > >> >> ption"],
> > >> > >> > >> >>     "msg":"org.apache.http.
> MalformedChunkCodingException:
> > >> CRLF
> > >> > >> > >> expected
> > >> > >> > >> >> at
> > >> > >> > >> >> end of chunk",
> > >> > >> > >> >>     "trace":"org.apache.solr.common.SolrException:
> > >> > >> > >> >> org.apache.http.MalformedChunkCodingException: CRLF
> > >> expected at
> > >> > >> end
> > >> > >> > of
> > >> > >> > >> >> chunk\r\n\tat
> > >> > >> > >> >> org.apache.solr.client.solrj.
> io.stream.TupleStream.lambda$
> > wr
> > >> > >> > >> >> iteMap$0(TupleStream.java:79)\r\n\tat
> > >> > >> > >> >> org.apache.solr.response.JSONWriter.writeIterator(
> > JSONRespon
> > >> > >> > >> >> seWriter.java:523)\r\n\tat
> > >> > >> > >> >> org.apache.solr.response.TextResponseWriter.writeVal(
> > TextRes
> > >> > >> > >> >> ponseWriter.java:175)\r\n\tat
> > >> > >> > >> >> org.apache.solr.response.JSONWriter$2.put(
> > JSONResponseWriter
> > >> > >> > >> >> .java:559)\r\n\tat
> > >> > >> > >> >> org.apache.solr.client.solrj.io.stream.TupleStream.
> > writeMap(
> > >> > >> > >> >> TupleStream.java:64)\r\n\tat
> > >> > >> > >> >> org.apache.solr.response.JSONWriter.writeMap(
> > JSONResponseWri
> > >> > >> > >> >> ter.java:547)\r\n\tat
> > >> > >> > >> >> org.apache.solr.response.TextResponseWriter.writeVal(
> > TextRes
> > >> > >> > >> >> ponseWriter.java:193)\r\n\tat
> > >> > >> > >> >> org.apache.solr.response.JSONWriter.
> > writeNamedListAsMapWithD
> > >> > >> > >> >> ups(JSONResponseWriter.java:209)\r\n\tat
> > >> > >> > >> >> org.apache.solr.response.JSONWriter.writeNamedList(
> > JSONRespo
> > >> > >> > >> >> nseWriter.java:325)\r\n\tat
> > >> > >> > >> >> org.apache.solr.response.JSONWriter.writeResponse(
> > JSONRespon
> > >> > >> > >> >> seWriter.java:120)\r\n\tat
> > >> > >> > >> >> org.apache.solr.response.JSONResponseWriter.write(
> > JSONRespon
> > >> > >> > >> >> seWriter.java:71)\r\n\tat
> > >> > >> > >> >> org.apache.solr.response.QueryResponseWriterUtil.
> > writeQueryR
> > >> > >> > >> >> esponse(QueryResponseWriterUtil.java:65)\r\n\tat
> > >> > >> > >> >> org.apache.solr.servlet.HttpSolrCall.writeResponse(
> > HttpSolrC
> > >> > >> > >> >> all.java:732)\r\n\tat
> > >> > >> > >> >> org.apache.solr.servlet.HttpSolrCall.call(
> > HttpSolrCall.java:
> > >> > >> > >> 473)\r\n\tat
> > >> > >> > >> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> > SolrDisp
> > >> > >> > >> >> atchFilter.java:345)\r\n\tat
> > >> > >> > >> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> > SolrDisp
> > >> > >> > >> >> atchFilter.java:296)\r\n\tat
> > >> > >> > >> >> org.eclipse.jetty.servlet.ServletHandler$CachedChain.
> > doFilte
> > >> > >> > >> >> r(ServletHandler.java:1691)\r\n\tat
> > >> > >> > >> >> org.eclipse.jetty.servlet.ServletHandler.doHandle(
> > ServletHan
> > >> > >> > >> >> dler.java:582)\r\n\tat
> > >> > >> > >> >> org.eclipse.jetty.server.handler.ScopedHandler.handle(
> > Scoped
> > >> > >> > >> >> Handler.java:143)\r\n\tat
> > >> > >> > >> >> org.eclipse.jetty.security.SecurityHandler.handle(
> > SecurityHa
> > >> > >> > >> >> ndler.java:548)\r\n\tat
> > >> > >> > >> >> org.eclipse.jetty.server.session.SessionHandler.
> doHandle(
> > >> > >> > >> >> SessionHandler.java:226)\r\n\tat
> > >> > >> > >> >> org.eclipse.jetty.server.handler.ContextHandler.
> doHandle(
> > >> > >> > >> >> ContextHandler.java:1180)\r\n\tat
> > >> > >> > >> >> org.eclipse.jetty.servlet.ServletHandler.doScope(
> > ServletHand
> > >> > >> > >> >> ler.java:512)\r\n\tat
> > >> > >> > >> >> org.eclipse.jetty.server.session.SessionHandler.doScope(
> > >> > >> > >> >> SessionHandler.java:185)\r\n\tat
> > >> > >> > >> >> org.eclipse.jetty.server.handler.ContextHandler.doScope(
> > >> > >> > >> >> ContextHandler.java:1112)\r\n\tat
> > >> > >> > >> >> org.eclipse.jetty.server.handler.ScopedHandler.handle(
> > Scoped
> > >> > >> > >> >> Handler.java:141)\r\n\tat
> > >> > >> > >> >> org.eclipse.jetty.server.handler.
> > ContextHandlerCollection.ha
> > >> > >> > >> >> ndle(ContextHandlerCollection.java:213)\r\n\tat
> > >> > >> > >> >> org.eclipse.jetty.server.handler.HandlerCollection.
> handle(
> > >> > >> > >> >> HandlerCollection.java:119)\r\n\tat
> > >> > >> > >> >> org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> > Handl
> > >> > >> > >> >> erWrapper.java:134)\r\n\tat
> > >> > >> > >> >> org.eclipse.jetty.server.Server.handle(Server.java:534)
> > \r\n\
> > >> tat
> > >> > >> > >> >> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.
> > >> > >> > >> java:320)\r\n\tat
> > >> > >> > >> >> org.eclipse.jetty.server.HttpConnection.onFillable(
> > HttpConne
> > >> > >> > >> >> ction.java:251)\r\n\tat
> > >> > >> > >> >> org.eclipse.jetty.io.AbstractConnection$ReadCallback.
> > >> > >> > >> >> succeeded(AbstractConnection.java:273)\r\n\tat
> > >> > >> > >> >> org.eclipse.jetty.io.FillInterest.fillable(FillInterest.
> > >> > >> > >> java:95)\r\n\tat
> > >> > >> > >> >> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(
> > SelectChann
> > >> > >> > >> >> elEndPoint.java:93)\r\n\tat
> > >> > >> > >> >> org.eclipse.jetty.util.thread.
> > strategy.ExecuteProduceConsume
> > >> > >> > >> >> .executeProduceConsume(ExecuteProduceConsume.java:
> > 303)\r\n\
> > >> tat
> > >> > >> > >> >> org.eclipse.jetty.util.thread.
> > strategy.ExecuteProduceConsume
> > >> > >> > >> >> .produceConsume(ExecuteProduceConsume.java:148)\r\n\tat
> > >> > >> > >> >> org.eclipse.jetty.util.thread.
> > strategy.ExecuteProduceConsume
> > >> > >> > >> >> .run(ExecuteProduceConsume.java:136)\r\n\tat
> > >> > >> > >> >> org.eclipse.jetty.util.thread.
> > QueuedThreadPool.runJob(Queued
> > >> > >> > >> >> ThreadPool.java:671)\r\n\tat
> > >> > >> > >> >> org.eclipse.jetty.util.thread.
> > QueuedThreadPool$2.run(QueuedT
> > >> > >> > >> >> hreadPool.java:589)\r\n\tat
> > >> > >> > >> >> java.lang.Thread.run(Thread.java:745)\r\nCaused by:
> > >> > >> > >> >> org.apache.http.MalformedChunkCodingException: CRLF
> > >> expected at
> > >> > >> end
> > >> > >> > of
> > >> > >> > >> >> chunk\r\n\tat
> > >> > >> > >> >> org.apache.http.impl.io.ChunkedInputStream.
> > getChunkSize(Chun
> > >> > >> > >> >> kedInputStream.java:255)\r\n\tat
> > >> > >> > >> >> org.apache.http.impl.io.ChunkedInputStream.nextChunk(
> > Chunked
> > >> > >> > >> >> InputStream.java:227)\r\n\tat
> > >> > >> > >> >> org.apache.http.impl.io.ChunkedInputStream.read(
> > ChunkedInput
> > >> > >> > >> >> Stream.java:186)\r\n\tat
> > >> > >> > >> >> org.apache.http.impl.io.ChunkedInputStream.read(
> > ChunkedInput
> > >> > >> > >> >> Stream.java:215)\r\n\tat
> > >> > >> > >> >> org.apache.http.impl.io.ChunkedInputStream.close(
> > ChunkedInpu
> > >> > >> > >> >> tStream.java:316)\r\n\tat
> > >> > >> > >> >> org.apache.http.conn.BasicManagedEntity.
> > streamClosed(BasicMa
> > >> > >> > >> >> nagedEntity.java:164)\r\n\tat
> > >> > >> > >> >> org.apache.http.conn.EofSensorInputStream.
> > checkClose(EofSens
> > >> > >> > >> >> orInputStream.java:228)\r\n\tat
> > >> > >> > >> >> org.apache.http.conn.EofSensorInputStream.close(
> > EofSensorInp
> > >> > >> > >> >> utStream.java:174)\r\n\tat
> > >> > >> > >> >> sun.nio.cs.StreamDecoder.implClose(StreamDecoder.java:
> > 378)\
> > >> > >> r\n\tat
> > >> > >> > >> >> sun.nio.cs.StreamDecoder.close(StreamDecoder.java:193)\
> > r\n\
> > >> tat
> > >> > >> > >> >> java.io.InputStreamReader.close(InputStreamReader.java:
> > 199)\
> > >> > >> r\n\tat
> > >> > >> > >> >> org.apache.solr.client.solrj.io.stream.JSONTupleStream.
> > close
> > >> > >> > >> >> (JSONTupleStream.java:92)\r\n\tat
> > >> > >> > >> >> org.apache.solr.client.solrj.io.stream.SolrStream.close(
> > Solr
> > >> > >> > >> >> Stream.java:193)\r\n\tat
> > >> > >> > >> >> org.apache.solr.client.solrj.io.stream.CloudSolrStream.
> > close
> > >> > >> > >> >> (CloudSolrStream.java:464)\r\n\tat
> > >> > >> > >> >> org.apache.solr.client.solrj.io.stream.HashJoinStream.
> > close(
> > >> > >> > >> >> HashJoinStream.java:231)\r\n\tat
> > >> > >> > >> >> org.apache.solr.client.solrj.io.stream.ExceptionStream.
> > close
> > >> > >> > >> >> (ExceptionStream.java:93)\r\n\tat
> > >> > >> > >> >> org.apache.solr.handler.StreamHandler$TimerStream.close(
> > >> > >> > >> >> StreamHandler.java:452)\r\n\tat
> > >> > >> > >> >> org.apache.solr.client.solrj.
> io.stream.TupleStream.lambda$
> > wr
> > >> > >> > >> >> iteMap$0(TupleStream.java:71)\r\n\t...
> > >> > >> > >> >> 40 more\r\n",
> > >> > >> > >> >>     "code":500}}
> > >> > >> > >> >>
> > >> > >> > >> >>
> > >> > >> > >> >> Regards,
> > >> > >> > >> >> Edwin
> > >> > >> > >> >>
> > >> > >> > >> >>
> > >> > >> > >> >> On 4 May 2017 at 00:00, Joel Bernstein <
> joelsolr@gmail.com
> > >
> > >> > >> wrote:
> > >> > >> > >> >>
> > >> > >> > >> >> > I've reformatted the expression below and made a few
> > >> changes.
> > >> > >> You
> > >> > >> > >> have
> > >> > >> > >> >> put
> > >> > >> > >> >> > things together properly. But these are MapReduce joins
> > >> that
> > >> > >> > require
> > >> > >> > >> >> > exporting the entire result sets. So you will need to
> add
> > >> > >> > qt=/export
> > >> > >> > >> to
> > >> > >> > >> >> all
> > >> > >> > >> >> > the searches and remove the rows param. In Solr 6.6.
> > there
> > >> is
> > >> > a
> > >> > >> new
> > >> > >> > >> >> > "shuffle" expression that does this automatically.
> > >> > >> > >> >> >
> > >> > >> > >> >> > To test things you'll want to break down each
> expression
> > >> and
> > >> > >> make
> > >> > >> > >> sure
> > >> > >> > >> >> it's
> > >> > >> > >> >> > behaving as expected.
> > >> > >> > >> >> >
> > >> > >> > >> >> > For example first run each search. Then run the
> > innerJoin,
> > >> not
> > >> > >> in
> > >> > >> > >> >> parallel
> > >> > >> > >> >> > mode. Then run it in parallel mode. Then try the whole
> > >> thing.
> > >> > >> > >> >> >
> > >> > >> > >> >> > hashJoin(parallel(collection2,
> > >> > >> > >> >> >
>  innerJoin(search(collection2,
> > >> > >> > >> >> >
> > >> q=*:*,
> > >> > >> > >> >> >
> > >> > >> > >> >> >  fl="a_s,b_s,c_s,d_s,e_s",
> > >> > >> > >> >> >
> > >> > sort="a_s
> > >> > >> > >> asc",
> > >> > >> > >> >> >
> > >> > >> > >> >> partitionKeys="a_s",
> > >> > >> > >> >> >
> > >> > >> > qt="/export"),
> > >> > >> > >> >> >
> > >> search(collection1,
> > >> > >> > >> >> >
> > >> q=*:*,
> > >> > >> > >> >> >
> > >> > >> > >> >> >  fl="a_s,f_s,g_s,h_s,i_s,j_s",
> > >> > >> > >> >> >
> > >> > sort="a_s
> > >> > >> > >> asc",
> > >> > >> > >> >> >
> > >> > >> > >> >>  partitionKeys="a_s",
> > >> > >> > >> >> >
> > >> > >> >  qt="/export"),
> > >> > >> > >> >> >                                            on="a_s"),
> > >> > >> > >> >> >                              workers="2",
> > >> > >> > >> >> >                              sort="a_s asc"),
> > >> > >> > >> >> >                hashed=search(collection3,
> > >> > >> > >> >> >                                          q=*:*,
> > >> > >> > >> >> >
> > fl="a_s,k_s,l_s",
> > >> > >> > >> >> >                                          sort="a_s
> asc",
> > >> > >> > >> >> >                                          qt="/export"),
> > >> > >> > >> >> >               on="a_s")
> > >> > >> > >> >> >
> > >> > >> > >> >> > Joel Bernstein
> > >> > >> > >> >> > http://joelsolr.blogspot.com/
> > >> > >> > >> >> >
> > >> > >> > >> >> > On Wed, May 3, 2017 at 11:26 AM, Zheng Lin Edwin Yeo <
> > >> > >> > >> >> edwinyeozl@gmail.com
> > >> > >> > >> >> > >
> > >> > >> > >> >> > wrote:
> > >> > >> > >> >> >
> > >> > >> > >> >> > > Hi Joel,
> > >> > >> > >> >> > >
> > >> > >> > >> >> > > Thanks for the clarification.
> > >> > >> > >> >> > >
> > >> > >> > >> >> > > Would like to check, is this the correct way to do
> the
> > >> join?
> > >> > >> > >> >> Currently, I
> > >> > >> > >> >> > > could not get any results after putting in the
> hashJoin
> > >> for
> > >> > >> the
> > >> > >> > >> 3rd,
> > >> > >> > >> >> > > smallerStream collection (collection3).
> > >> > >> > >> >> > >
> > >> > >> > >> >> > > http://localhost:8983/solr/collection1/stream?expr=
> > >> > >> > >> >> > > hashJoin(parallel(collection2
> > >> > >> > >> >> > > ,
> > >> > >> > >> >> > > innerJoin(
> > >> > >> > >> >> > >  search(collection2,
> > >> > >> > >> >> > > q=*:*,
> > >> > >> > >> >> > > fl="a_s,b_s,c_s,d_s,e_s",
> > >> > >> > >> >> > >              sort="a_s asc",
> > >> > >> > >> >> > > partitionKeys="a_s",
> > >> > >> > >> >> > > rows=200),
> > >> > >> > >> >> > >  search(collection1,
> > >> > >> > >> >> > > q=*:*,
> > >> > >> > >> >> > > fl="a_s,f_s,g_s,h_s,i_s,j_s",
> > >> > >> > >> >> > >              sort="a_s asc",
> > >> > >> > >> >> > > partitionKeys="a_s",
> > >> > >> > >> >> > > rows=200),
> > >> > >> > >> >> > >          on="a_s"),
> > >> > >> > >> >> > > workers="2",
> > >> > >> > >> >> > >                  sort="a_s asc"),
> > >> > >> > >> >> > >          hashed=search(collection3,
> > >> > >> > >> >> > > q=*:*,
> > >> > >> > >> >> > > fl="a_s,k_s,l_s",
> > >> > >> > >> >> > > sort="a_s asc",
> > >> > >> > >> >> > > rows=200),
> > >> > >> > >> >> > > on="a_s")
> > >> > >> > >> >> > > &indent=true
> > >> > >> > >> >> > >
> > >> > >> > >> >> > >
> > >> > >> > >> >> > > Regards,
> > >> > >> > >> >> > > Edwin
> > >> > >> > >> >> > >
> > >> > >> > >> >> > >
> > >> > >> > >> >> > > On 3 May 2017 at 20:59, Joel Bernstein <
> > >> joelsolr@gmail.com>
> > >> > >> > wrote:
> > >> > >> > >> >> > >
> > >> > >> > >> >> > > > Sorry, it's just called hashJoin
> > >> > >> > >> >> > > >
> > >> > >> > >> >> > > > Joel Bernstein
> > >> > >> > >> >> > > > http://joelsolr.blogspot.com/
> > >> > >> > >> >> > > >
> > >> > >> > >> >> > > > On Wed, May 3, 2017 at 2:45 AM, Zheng Lin Edwin
> Yeo <
> > >> > >> > >> >> > > edwinyeozl@gmail.com>
> > >> > >> > >> >> > > > wrote:
> > >> > >> > >> >> > > >
> > >> > >> > >> >> > > > > Hi Joel,
> > >> > >> > >> >> > > > >
> > >> > >> > >> >> > > > > I am getting this error when I used the
> > >> innerHashJoin.
> > >> > >> > >> >> > > > >
> > >> > >> > >> >> > > > >  "EXCEPTION":"Invalid stream expression
> > >> > >> > innerHashJoin(parallel(
> > >> > >> > >> >> > > innerJoin
> > >> > >> > >> >> > > > >
> > >> > >> > >> >> > > > > I also can't find the documentation on
> > innerHashJoin
> > >> for
> > >> > >> the
> > >> > >> > >> >> > Streaming
> > >> > >> > >> >> > > > > Expressions.
> > >> > >> > >> >> > > > >
> > >> > >> > >> >> > > > > Are you referring to hashJoin?
> > >> > >> > >> >> > > > >
> > >> > >> > >> >> > > > > Regards,
> > >> > >> > >> >> > > > > Edwin
> > >> > >> > >> >> > > > >
> > >> > >> > >> >> > > > >
> > >> > >> > >> >> > > > > On 3 May 2017 at 13:20, Zheng Lin Edwin Yeo <
> > >> > >> > >> edwinyeozl@gmail.com
> > >> > >> > >> >> >
> > >> > >> > >> >> > > > wrote:
> > >> > >> > >> >> > > > >
> > >> > >> > >> >> > > > > > Hi Joel,
> > >> > >> > >> >> > > > > >
> > >> > >> > >> >> > > > > > Thanks for the info.
> > >> > >> > >> >> > > > > >
> > >> > >> > >> >> > > > > > Regards,
> > >> > >> > >> >> > > > > > Edwin
> > >> > >> > >> >> > > > > >
> > >> > >> > >> >> > > > > >
> > >> > >> > >> >> > > > > > On 3 May 2017 at 02:04, Joel Bernstein <
> > >> > >> joelsolr@gmail.com
> > >> > >> > >
> > >> > >> > >> >> wrote:
> > >> > >> > >> >> > > > > >
> > >> > >> > >> >> > > > > >> Also take a look at the documentation for the
> > >> "fetch"
> > >> > >> > >> streaming
> > >> > >> > >> >> > > > > >> expression.
> > >> > >> > >> >> > > > > >>
> > >> > >> > >> >> > > > > >> Joel Bernstein
> > >> > >> > >> >> > > > > >> http://joelsolr.blogspot.com/
> > >> > >> > >> >> > > > > >>
> > >> > >> > >> >> > > > > >> On Tue, May 2, 2017 at 2:03 PM, Joel
> Bernstein <
> > >> > >> > >> >> > joelsolr@gmail.com>
> > >> > >> > >> >> > > > > >> wrote:
> > >> > >> > >> >> > > > > >>
> > >> > >> > >> >> > > > > >> > Yes you join more then one collection with
> > >> > Streaming
> > >> > >> > >> >> > Expressions.
> > >> > >> > >> >> > > > Here
> > >> > >> > >> >> > > > > >> are
> > >> > >> > >> >> > > > > >> > a few things to keep in mind.
> > >> > >> > >> >> > > > > >> >
> > >> > >> > >> >> > > > > >> > * You'll likely want to use the parallel
> > >> function
> > >> > >> around
> > >> > >> > >> the
> > >> > >> > >> >> > > largest
> > >> > >> > >> >> > > > > >> join.
> > >> > >> > >> >> > > > > >> > You'll need to use the join keys as the
> > >> > >> partitionKeys.
> > >> > >> > >> >> > > > > >> > * innerJoin: requires that the streams be
> > >> sorted on
> > >> > >> the
> > >> > >> > >> join
> > >> > >> > >> >> > keys.
> > >> > >> > >> >> > > > > >> > * innerHashJoin: has no sorting requirement.
> > >> > >> > >> >> > > > > >> >
> > >> > >> > >> >> > > > > >> > So a strategy for a three collection join
> > might
> > >> > look
> > >> > >> > like
> > >> > >> > >> >> this:
> > >> > >> > >> >> > > > > >> >
> > >> > >> > >> >> > > > > >> > innerHashJoin(parallel(innerJoin(bigStream,
> > >> > >> > bigStream)),
> > >> > >> > >> >> > > > > smallerStream)
> > >> > >> > >> >> > > > > >> >
> > >> > >> > >> >> > > > > >> > The largest join can be done in parallel
> using
> > >> an
> > >> > >> > >> innerJoin.
> > >> > >> > >> >> You
> > >> > >> > >> >> > > can
> > >> > >> > >> >> > > > > >> then
> > >> > >> > >> >> > > > > >> > wrap the stream coming out of the parallel
> > >> function
> > >> > >> in
> > >> > >> > an
> > >> > >> > >> >> > > > > innerHashJoin
> > >> > >> > >> >> > > > > >> to
> > >> > >> > >> >> > > > > >> > join it to another stream.
> > >> > >> > >> >> > > > > >> >
> > >> > >> > >> >> > > > > >> >
> > >> > >> > >> >> > > > > >> >
> > >> > >> > >> >> > > > > >> >
> > >> > >> > >> >> > > > > >> >
> > >> > >> > >> >> > > > > >> >
> > >> > >> > >> >> > > > > >> >
> > >> > >> > >> >> > > > > >> >
> > >> > >> > >> >> > > > > >> >
> > >> > >> > >> >> > > > > >> >
> > >> > >> > >> >> > > > > >> >
> > >> > >> > >> >> > > > > >> >
> > >> > >> > >> >> > > > > >> >
> > >> > >> > >> >> > > > > >> >
> > >> > >> > >> >> > > > > >> >
> > >> > >> > >> >> > > > > >> > Joel Bernstein
> > >> > >> > >> >> > > > > >> > http://joelsolr.blogspot.com/
> > >> > >> > >> >> > > > > >> >
> > >> > >> > >> >> > > > > >> > On Mon, May 1, 2017 at 9:42 PM, Zheng Lin
> > Edwin
> > >> > Yeo <
> > >> > >> > >> >> > > > > >> edwinyeozl@gmail.com>
> > >> > >> > >> >> > > > > >> > wrote:
> > >> > >> > >> >> > > > > >> >
> > >> > >> > >> >> > > > > >> >> Hi,
> > >> > >> > >> >> > > > > >> >>
> > >> > >> > >> >> > > > > >> >> Is it possible to join more than 2
> > collections
> > >> > using
> > >> > >> > one
> > >> > >> > >> of
> > >> > >> > >> >> the
> > >> > >> > >> >> > > > > >> streaming
> > >> > >> > >> >> > > > > >> >> expressions (Eg: innerJoin)? If not, is
> there
> > >> > other
> > >> > >> > ways
> > >> > >> > >> we
> > >> > >> > >> >> can
> > >> > >> > >> >> > > do
> > >> > >> > >> >> > > > > it?
> > >> > >> > >> >> > > > > >> >>
> > >> > >> > >> >> > > > > >> >> Currently, I may need to join 3 or 4
> > >> collections
> > >> > >> > >> together,
> > >> > >> > >> >> and
> > >> > >> > >> >> > to
> > >> > >> > >> >> > > > > >> output
> > >> > >> > >> >> > > > > >> >> selected fields from all these collections
> > >> > together.
> > >> > >> > >> >> > > > > >> >>
> > >> > >> > >> >> > > > > >> >> I'm using Solr 6.4.2.
> > >> > >> > >> >> > > > > >> >>
> > >> > >> > >> >> > > > > >> >> Regards,
> > >> > >> > >> >> > > > > >> >> Edwin
> > >> > >> > >> >> > > > > >> >>
> > >> > >> > >> >> > > > > >> >
> > >> > >> > >> >> > > > > >> >
> > >> > >> > >> >> > > > > >>
> > >> > >> > >> >> > > > > >
> > >> > >> > >> >> > > > > >
> > >> > >> > >> >> > > > >
> > >> > >> > >> >> > > >
> > >> > >> > >> >> > >
> > >> > >> > >> >> >
> > >> > >> > >> >>
> > >> > >> > >> >
> > >> > >> > >> >
> > >> > >> > >>
> > >> > >> > >
> > >> > >> > >
> > >> > >> >
> > >> > >>
> > >> > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message