lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zheng Lin Edwin Yeo <edwinye...@gmail.com>
Subject Re: Joining more than 2 collections
Date Thu, 04 May 2017 06:41:21 GMT
Hi Joel,

I have managed to get the Join to work, but so far it is only working when
I use qt="/select". It is not working when I use qt="/export".

For the display of the field, is there a way to allow it to list them in
the order which I want?
Currently, the display is quite random, and I can get a field in
collection1, followed by a field in collection3, then collection1 again,
and then collection2.

It will be good if we can arrange the field to display in the order that we
want.

Regards,
Edwin



On 4 May 2017 at 09:56, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com> wrote:

> Hi Joel,
>
> It works when I started off with just one expression.
>
> Could it be that the data size is too big for export after the join, which
> causes the error?
>
> Regards,
> Edwin
>
> On 4 May 2017 at 02:53, Joel Bernstein <joelsolr@gmail.com> wrote:
>
>> I was just testing with the query below and it worked for me. Some of the
>> error messages I was getting with the syntax was not what I was expecting
>> though, so I'll look into the error handling. But the joins do work when
>> the syntax correct. The query below is joining to the same collection
>> three
>> times, but the mechanics are exactly the same joining three different
>> tables. In this example each join narrows down the result set.
>>
>> hashJoin(parallel(collection2,
>>                             workers=3,
>>                             sort="id asc",
>>                             innerJoin(search(collection2, q="*:*",
>> fl="id",
>> sort="id asc", qt="/export", partitionKeys="id"),
>>                                             search(collection2,
>> q="year_i:42", fl="id, year_i", sort="id asc", qt="/export",
>> partitionKeys="id"),
>>                                             on="id")),
>>                 hashed=search(collection2, q="day_i:7", fl="id, day_i",
>> sort="id asc", qt="/export"),
>>                 on="id")
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>> On Wed, May 3, 2017 at 1:29 PM, Joel Bernstein <joelsolr@gmail.com>
>> wrote:
>>
>> > Start off with just this expression:
>> >
>> > search(collection2,
>> >             q=*:*,
>> >             fl="a_s,b_s,c_s,d_s,e_s",
>> >             sort="a_s asc",
>> >             qt="/export")
>> >
>> > And then check the logs for exceptions.
>> >
>> > Joel Bernstein
>> > http://joelsolr.blogspot.com/
>> >
>> > On Wed, May 3, 2017 at 12:35 PM, Zheng Lin Edwin Yeo <
>> edwinyeozl@gmail.com
>> > > wrote:
>> >
>> >> Hi Joel,
>> >>
>> >> I am getting this error after I change add qt=/export and removed the
>> rows
>> >> param. Do you know what could be the reason?
>> >>
>> >> {
>> >>   "error":{
>> >>     "metadata":[
>> >>       "error-class","org.apache.solr.common.SolrException",
>> >>       "root-error-class","org.apache.http.MalformedChunkCodingExce
>> >> ption"],
>> >>     "msg":"org.apache.http.MalformedChunkCodingException: CRLF
>> expected
>> >> at
>> >> end of chunk",
>> >>     "trace":"org.apache.solr.common.SolrException:
>> >> org.apache.http.MalformedChunkCodingException: CRLF expected at end of
>> >> chunk\r\n\tat
>> >> org.apache.solr.client.solrj.io.stream.TupleStream.lambda$wr
>> >> iteMap$0(TupleStream.java:79)\r\n\tat
>> >> org.apache.solr.response.JSONWriter.writeIterator(JSONRespon
>> >> seWriter.java:523)\r\n\tat
>> >> org.apache.solr.response.TextResponseWriter.writeVal(TextRes
>> >> ponseWriter.java:175)\r\n\tat
>> >> org.apache.solr.response.JSONWriter$2.put(JSONResponseWriter
>> >> .java:559)\r\n\tat
>> >> org.apache.solr.client.solrj.io.stream.TupleStream.writeMap(
>> >> TupleStream.java:64)\r\n\tat
>> >> org.apache.solr.response.JSONWriter.writeMap(JSONResponseWri
>> >> ter.java:547)\r\n\tat
>> >> org.apache.solr.response.TextResponseWriter.writeVal(TextRes
>> >> ponseWriter.java:193)\r\n\tat
>> >> org.apache.solr.response.JSONWriter.writeNamedListAsMapWithD
>> >> ups(JSONResponseWriter.java:209)\r\n\tat
>> >> org.apache.solr.response.JSONWriter.writeNamedList(JSONRespo
>> >> nseWriter.java:325)\r\n\tat
>> >> org.apache.solr.response.JSONWriter.writeResponse(JSONRespon
>> >> seWriter.java:120)\r\n\tat
>> >> org.apache.solr.response.JSONResponseWriter.write(JSONRespon
>> >> seWriter.java:71)\r\n\tat
>> >> org.apache.solr.response.QueryResponseWriterUtil.writeQueryR
>> >> esponse(QueryResponseWriterUtil.java:65)\r\n\tat
>> >> org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrC
>> >> all.java:732)\r\n\tat
>> >> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:
>> 473)\r\n\tat
>> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDisp
>> >> atchFilter.java:345)\r\n\tat
>> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDisp
>> >> atchFilter.java:296)\r\n\tat
>> >> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilte
>> >> r(ServletHandler.java:1691)\r\n\tat
>> >> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHan
>> >> dler.java:582)\r\n\tat
>> >> org.eclipse.jetty.server.handler.ScopedHandler.handle(Scoped
>> >> Handler.java:143)\r\n\tat
>> >> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHa
>> >> ndler.java:548)\r\n\tat
>> >> org.eclipse.jetty.server.session.SessionHandler.doHandle(
>> >> SessionHandler.java:226)\r\n\tat
>> >> org.eclipse.jetty.server.handler.ContextHandler.doHandle(
>> >> ContextHandler.java:1180)\r\n\tat
>> >> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHand
>> >> ler.java:512)\r\n\tat
>> >> org.eclipse.jetty.server.session.SessionHandler.doScope(
>> >> SessionHandler.java:185)\r\n\tat
>> >> org.eclipse.jetty.server.handler.ContextHandler.doScope(
>> >> ContextHandler.java:1112)\r\n\tat
>> >> org.eclipse.jetty.server.handler.ScopedHandler.handle(Scoped
>> >> Handler.java:141)\r\n\tat
>> >> org.eclipse.jetty.server.handler.ContextHandlerCollection.ha
>> >> ndle(ContextHandlerCollection.java:213)\r\n\tat
>> >> org.eclipse.jetty.server.handler.HandlerCollection.handle(
>> >> HandlerCollection.java:119)\r\n\tat
>> >> org.eclipse.jetty.server.handler.HandlerWrapper.handle(Handl
>> >> erWrapper.java:134)\r\n\tat
>> >> org.eclipse.jetty.server.Server.handle(Server.java:534)\r\n\tat
>> >> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.
>> java:320)\r\n\tat
>> >> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConne
>> >> ction.java:251)\r\n\tat
>> >> org.eclipse.jetty.io.AbstractConnection$ReadCallback.
>> >> succeeded(AbstractConnection.java:273)\r\n\tat
>> >> org.eclipse.jetty.io.FillInterest.fillable(FillInterest.
>> java:95)\r\n\tat
>> >> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChann
>> >> elEndPoint.java:93)\r\n\tat
>> >> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
>> >> .executeProduceConsume(ExecuteProduceConsume.java:303)\r\n\tat
>> >> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
>> >> .produceConsume(ExecuteProduceConsume.java:148)\r\n\tat
>> >> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
>> >> .run(ExecuteProduceConsume.java:136)\r\n\tat
>> >> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(Queued
>> >> ThreadPool.java:671)\r\n\tat
>> >> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedT
>> >> hreadPool.java:589)\r\n\tat
>> >> java.lang.Thread.run(Thread.java:745)\r\nCaused by:
>> >> org.apache.http.MalformedChunkCodingException: CRLF expected at end of
>> >> chunk\r\n\tat
>> >> org.apache.http.impl.io.ChunkedInputStream.getChunkSize(Chun
>> >> kedInputStream.java:255)\r\n\tat
>> >> org.apache.http.impl.io.ChunkedInputStream.nextChunk(Chunked
>> >> InputStream.java:227)\r\n\tat
>> >> org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInput
>> >> Stream.java:186)\r\n\tat
>> >> org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInput
>> >> Stream.java:215)\r\n\tat
>> >> org.apache.http.impl.io.ChunkedInputStream.close(ChunkedInpu
>> >> tStream.java:316)\r\n\tat
>> >> org.apache.http.conn.BasicManagedEntity.streamClosed(BasicMa
>> >> nagedEntity.java:164)\r\n\tat
>> >> org.apache.http.conn.EofSensorInputStream.checkClose(EofSens
>> >> orInputStream.java:228)\r\n\tat
>> >> org.apache.http.conn.EofSensorInputStream.close(EofSensorInp
>> >> utStream.java:174)\r\n\tat
>> >> sun.nio.cs.StreamDecoder.implClose(StreamDecoder.java:378)\r\n\tat
>> >> sun.nio.cs.StreamDecoder.close(StreamDecoder.java:193)\r\n\tat
>> >> java.io.InputStreamReader.close(InputStreamReader.java:199)\r\n\tat
>> >> org.apache.solr.client.solrj.io.stream.JSONTupleStream.close
>> >> (JSONTupleStream.java:92)\r\n\tat
>> >> org.apache.solr.client.solrj.io.stream.SolrStream.close(Solr
>> >> Stream.java:193)\r\n\tat
>> >> org.apache.solr.client.solrj.io.stream.CloudSolrStream.close
>> >> (CloudSolrStream.java:464)\r\n\tat
>> >> org.apache.solr.client.solrj.io.stream.HashJoinStream.close(
>> >> HashJoinStream.java:231)\r\n\tat
>> >> org.apache.solr.client.solrj.io.stream.ExceptionStream.close
>> >> (ExceptionStream.java:93)\r\n\tat
>> >> org.apache.solr.handler.StreamHandler$TimerStream.close(
>> >> StreamHandler.java:452)\r\n\tat
>> >> org.apache.solr.client.solrj.io.stream.TupleStream.lambda$wr
>> >> iteMap$0(TupleStream.java:71)\r\n\t...
>> >> 40 more\r\n",
>> >>     "code":500}}
>> >>
>> >>
>> >> Regards,
>> >> Edwin
>> >>
>> >>
>> >> On 4 May 2017 at 00:00, Joel Bernstein <joelsolr@gmail.com> wrote:
>> >>
>> >> > I've reformatted the expression below and made a few changes. You
>> have
>> >> put
>> >> > things together properly. But these are MapReduce joins that require
>> >> > exporting the entire result sets. So you will need to add qt=/export
>> to
>> >> all
>> >> > the searches and remove the rows param. In Solr 6.6. there is a new
>> >> > "shuffle" expression that does this automatically.
>> >> >
>> >> > To test things you'll want to break down each expression and make
>> sure
>> >> it's
>> >> > behaving as expected.
>> >> >
>> >> > For example first run each search. Then run the innerJoin, not in
>> >> parallel
>> >> > mode. Then run it in parallel mode. Then try the whole thing.
>> >> >
>> >> > hashJoin(parallel(collection2,
>> >> >                             innerJoin(search(collection2,
>> >> >                                                        q=*:*,
>> >> >
>> >> >  fl="a_s,b_s,c_s,d_s,e_s",
>> >> >                                                        sort="a_s
>> asc",
>> >> >
>> >> partitionKeys="a_s",
>> >> >                                                        qt="/export"),
>> >> >                                            search(collection1,
>> >> >                                                        q=*:*,
>> >> >
>> >> >  fl="a_s,f_s,g_s,h_s,i_s,j_s",
>> >> >                                                        sort="a_s
>> asc",
>> >> >
>> >>  partitionKeys="a_s",
>> >> >                                                       qt="/export"),
>> >> >                                            on="a_s"),
>> >> >                              workers="2",
>> >> >                              sort="a_s asc"),
>> >> >                hashed=search(collection3,
>> >> >                                          q=*:*,
>> >> >                                          fl="a_s,k_s,l_s",
>> >> >                                          sort="a_s asc",
>> >> >                                          qt="/export"),
>> >> >               on="a_s")
>> >> >
>> >> > Joel Bernstein
>> >> > http://joelsolr.blogspot.com/
>> >> >
>> >> > On Wed, May 3, 2017 at 11:26 AM, Zheng Lin Edwin Yeo <
>> >> edwinyeozl@gmail.com
>> >> > >
>> >> > wrote:
>> >> >
>> >> > > Hi Joel,
>> >> > >
>> >> > > Thanks for the clarification.
>> >> > >
>> >> > > Would like to check, is this the correct way to do the join?
>> >> Currently, I
>> >> > > could not get any results after putting in the hashJoin for the
>> 3rd,
>> >> > > smallerStream collection (collection3).
>> >> > >
>> >> > > http://localhost:8983/solr/collection1/stream?expr=
>> >> > > hashJoin(parallel(collection2
>> >> > > ,
>> >> > > innerJoin(
>> >> > >  search(collection2,
>> >> > > q=*:*,
>> >> > > fl="a_s,b_s,c_s,d_s,e_s",
>> >> > >              sort="a_s asc",
>> >> > > partitionKeys="a_s",
>> >> > > rows=200),
>> >> > >  search(collection1,
>> >> > > q=*:*,
>> >> > > fl="a_s,f_s,g_s,h_s,i_s,j_s",
>> >> > >              sort="a_s asc",
>> >> > > partitionKeys="a_s",
>> >> > > rows=200),
>> >> > >          on="a_s"),
>> >> > > workers="2",
>> >> > >                  sort="a_s asc"),
>> >> > >          hashed=search(collection3,
>> >> > > q=*:*,
>> >> > > fl="a_s,k_s,l_s",
>> >> > > sort="a_s asc",
>> >> > > rows=200),
>> >> > > on="a_s")
>> >> > > &indent=true
>> >> > >
>> >> > >
>> >> > > Regards,
>> >> > > Edwin
>> >> > >
>> >> > >
>> >> > > On 3 May 2017 at 20:59, Joel Bernstein <joelsolr@gmail.com>
wrote:
>> >> > >
>> >> > > > Sorry, it's just called hashJoin
>> >> > > >
>> >> > > > Joel Bernstein
>> >> > > > http://joelsolr.blogspot.com/
>> >> > > >
>> >> > > > On Wed, May 3, 2017 at 2:45 AM, Zheng Lin Edwin Yeo <
>> >> > > edwinyeozl@gmail.com>
>> >> > > > wrote:
>> >> > > >
>> >> > > > > Hi Joel,
>> >> > > > >
>> >> > > > > I am getting this error when I used the innerHashJoin.
>> >> > > > >
>> >> > > > >  "EXCEPTION":"Invalid stream expression innerHashJoin(parallel(
>> >> > > innerJoin
>> >> > > > >
>> >> > > > > I also can't find the documentation on innerHashJoin
for the
>> >> > Streaming
>> >> > > > > Expressions.
>> >> > > > >
>> >> > > > > Are you referring to hashJoin?
>> >> > > > >
>> >> > > > > Regards,
>> >> > > > > Edwin
>> >> > > > >
>> >> > > > >
>> >> > > > > On 3 May 2017 at 13:20, Zheng Lin Edwin Yeo <
>> edwinyeozl@gmail.com
>> >> >
>> >> > > > wrote:
>> >> > > > >
>> >> > > > > > Hi Joel,
>> >> > > > > >
>> >> > > > > > Thanks for the info.
>> >> > > > > >
>> >> > > > > > Regards,
>> >> > > > > > Edwin
>> >> > > > > >
>> >> > > > > >
>> >> > > > > > On 3 May 2017 at 02:04, Joel Bernstein <joelsolr@gmail.com>
>> >> wrote:
>> >> > > > > >
>> >> > > > > >> Also take a look at the documentation for the
"fetch"
>> streaming
>> >> > > > > >> expression.
>> >> > > > > >>
>> >> > > > > >> Joel Bernstein
>> >> > > > > >> http://joelsolr.blogspot.com/
>> >> > > > > >>
>> >> > > > > >> On Tue, May 2, 2017 at 2:03 PM, Joel Bernstein
<
>> >> > joelsolr@gmail.com>
>> >> > > > > >> wrote:
>> >> > > > > >>
>> >> > > > > >> > Yes you join more then one collection
with Streaming
>> >> > Expressions.
>> >> > > > Here
>> >> > > > > >> are
>> >> > > > > >> > a few things to keep in mind.
>> >> > > > > >> >
>> >> > > > > >> > * You'll likely want to use the parallel
function around
>> the
>> >> > > largest
>> >> > > > > >> join.
>> >> > > > > >> > You'll need to use the join keys as the
partitionKeys.
>> >> > > > > >> > * innerJoin: requires that the streams
be sorted on the
>> join
>> >> > keys.
>> >> > > > > >> > * innerHashJoin: has no sorting requirement.
>> >> > > > > >> >
>> >> > > > > >> > So a strategy for a three collection join
might look like
>> >> this:
>> >> > > > > >> >
>> >> > > > > >> > innerHashJoin(parallel(innerJoin(bigStream,
bigStream)),
>> >> > > > > smallerStream)
>> >> > > > > >> >
>> >> > > > > >> > The largest join can be done in parallel
using an
>> innerJoin.
>> >> You
>> >> > > can
>> >> > > > > >> then
>> >> > > > > >> > wrap the stream coming out of the parallel
function in an
>> >> > > > > innerHashJoin
>> >> > > > > >> to
>> >> > > > > >> > join it to another stream.
>> >> > > > > >> >
>> >> > > > > >> >
>> >> > > > > >> >
>> >> > > > > >> >
>> >> > > > > >> >
>> >> > > > > >> >
>> >> > > > > >> >
>> >> > > > > >> >
>> >> > > > > >> >
>> >> > > > > >> >
>> >> > > > > >> >
>> >> > > > > >> >
>> >> > > > > >> >
>> >> > > > > >> >
>> >> > > > > >> >
>> >> > > > > >> > Joel Bernstein
>> >> > > > > >> > http://joelsolr.blogspot.com/
>> >> > > > > >> >
>> >> > > > > >> > On Mon, May 1, 2017 at 9:42 PM, Zheng
Lin Edwin Yeo <
>> >> > > > > >> edwinyeozl@gmail.com>
>> >> > > > > >> > wrote:
>> >> > > > > >> >
>> >> > > > > >> >> Hi,
>> >> > > > > >> >>
>> >> > > > > >> >> Is it possible to join more than 2
collections using one
>> of
>> >> the
>> >> > > > > >> streaming
>> >> > > > > >> >> expressions (Eg: innerJoin)? If not,
is there other ways
>> we
>> >> can
>> >> > > do
>> >> > > > > it?
>> >> > > > > >> >>
>> >> > > > > >> >> Currently, I may need to join 3 or
4 collections
>> together,
>> >> and
>> >> > to
>> >> > > > > >> output
>> >> > > > > >> >> selected fields from all these collections
together.
>> >> > > > > >> >>
>> >> > > > > >> >> I'm using Solr 6.4.2.
>> >> > > > > >> >>
>> >> > > > > >> >> Regards,
>> >> > > > > >> >> Edwin
>> >> > > > > >> >>
>> >> > > > > >> >
>> >> > > > > >> >
>> >> > > > > >>
>> >> > > > > >
>> >> > > > > >
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message