lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joel Bernstein <joels...@gmail.com>
Subject Re: Joining more than 2 collections
Date Thu, 04 May 2017 14:54:58 GMT
I suspect that there is something not quite right about the how the /export
handler is configured. Straight out of the box in solr 6.4.2  /export will
be automatically configured. Are you using a Solr instance that has been
upgraded in the past and doesn't have standard 6.4.2 configs?

To really do joins properly you'll have to use the /export handler because
/select will not stream entire result sets (unless they are pretty small).
So your results will be missing data possibly.

I would take a close look at the logs and see what all the exceptions are
when you run the a search using qt=/export. If you can post all the stack
traces that get generated when you run the search we'll probably be able to
spot the issue.

About the field ordering. There is support for field ordering in the
Streaming classes but only a few places actually enforce the order. The 6.5
SQL interface does keep the fields in order as does the new Tuple
expression in Solr 6.6. But the expressions you are working with currently
don't enforce field ordering.




Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, May 4, 2017 at 2:41 AM, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com>
wrote:

> Hi Joel,
>
> I have managed to get the Join to work, but so far it is only working when
> I use qt="/select". It is not working when I use qt="/export".
>
> For the display of the field, is there a way to allow it to list them in
> the order which I want?
> Currently, the display is quite random, and I can get a field in
> collection1, followed by a field in collection3, then collection1 again,
> and then collection2.
>
> It will be good if we can arrange the field to display in the order that we
> want.
>
> Regards,
> Edwin
>
>
>
> On 4 May 2017 at 09:56, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com> wrote:
>
> > Hi Joel,
> >
> > It works when I started off with just one expression.
> >
> > Could it be that the data size is too big for export after the join,
> which
> > causes the error?
> >
> > Regards,
> > Edwin
> >
> > On 4 May 2017 at 02:53, Joel Bernstein <joelsolr@gmail.com> wrote:
> >
> >> I was just testing with the query below and it worked for me. Some of
> the
> >> error messages I was getting with the syntax was not what I was
> expecting
> >> though, so I'll look into the error handling. But the joins do work when
> >> the syntax correct. The query below is joining to the same collection
> >> three
> >> times, but the mechanics are exactly the same joining three different
> >> tables. In this example each join narrows down the result set.
> >>
> >> hashJoin(parallel(collection2,
> >>                             workers=3,
> >>                             sort="id asc",
> >>                             innerJoin(search(collection2, q="*:*",
> >> fl="id",
> >> sort="id asc", qt="/export", partitionKeys="id"),
> >>                                             search(collection2,
> >> q="year_i:42", fl="id, year_i", sort="id asc", qt="/export",
> >> partitionKeys="id"),
> >>                                             on="id")),
> >>                 hashed=search(collection2, q="day_i:7", fl="id, day_i",
> >> sort="id asc", qt="/export"),
> >>                 on="id")
> >>
> >> Joel Bernstein
> >> http://joelsolr.blogspot.com/
> >>
> >> On Wed, May 3, 2017 at 1:29 PM, Joel Bernstein <joelsolr@gmail.com>
> >> wrote:
> >>
> >> > Start off with just this expression:
> >> >
> >> > search(collection2,
> >> >             q=*:*,
> >> >             fl="a_s,b_s,c_s,d_s,e_s",
> >> >             sort="a_s asc",
> >> >             qt="/export")
> >> >
> >> > And then check the logs for exceptions.
> >> >
> >> > Joel Bernstein
> >> > http://joelsolr.blogspot.com/
> >> >
> >> > On Wed, May 3, 2017 at 12:35 PM, Zheng Lin Edwin Yeo <
> >> edwinyeozl@gmail.com
> >> > > wrote:
> >> >
> >> >> Hi Joel,
> >> >>
> >> >> I am getting this error after I change add qt=/export and removed the
> >> rows
> >> >> param. Do you know what could be the reason?
> >> >>
> >> >> {
> >> >>   "error":{
> >> >>     "metadata":[
> >> >>       "error-class","org.apache.solr.common.SolrException",
> >> >>       "root-error-class","org.apache.http.MalformedChunkCodingExce
> >> >> ption"],
> >> >>     "msg":"org.apache.http.MalformedChunkCodingException: CRLF
> >> expected
> >> >> at
> >> >> end of chunk",
> >> >>     "trace":"org.apache.solr.common.SolrException:
> >> >> org.apache.http.MalformedChunkCodingException: CRLF expected at end
> of
> >> >> chunk\r\n\tat
> >> >> org.apache.solr.client.solrj.io.stream.TupleStream.lambda$wr
> >> >> iteMap$0(TupleStream.java:79)\r\n\tat
> >> >> org.apache.solr.response.JSONWriter.writeIterator(JSONRespon
> >> >> seWriter.java:523)\r\n\tat
> >> >> org.apache.solr.response.TextResponseWriter.writeVal(TextRes
> >> >> ponseWriter.java:175)\r\n\tat
> >> >> org.apache.solr.response.JSONWriter$2.put(JSONResponseWriter
> >> >> .java:559)\r\n\tat
> >> >> org.apache.solr.client.solrj.io.stream.TupleStream.writeMap(
> >> >> TupleStream.java:64)\r\n\tat
> >> >> org.apache.solr.response.JSONWriter.writeMap(JSONResponseWri
> >> >> ter.java:547)\r\n\tat
> >> >> org.apache.solr.response.TextResponseWriter.writeVal(TextRes
> >> >> ponseWriter.java:193)\r\n\tat
> >> >> org.apache.solr.response.JSONWriter.writeNamedListAsMapWithD
> >> >> ups(JSONResponseWriter.java:209)\r\n\tat
> >> >> org.apache.solr.response.JSONWriter.writeNamedList(JSONRespo
> >> >> nseWriter.java:325)\r\n\tat
> >> >> org.apache.solr.response.JSONWriter.writeResponse(JSONRespon
> >> >> seWriter.java:120)\r\n\tat
> >> >> org.apache.solr.response.JSONResponseWriter.write(JSONRespon
> >> >> seWriter.java:71)\r\n\tat
> >> >> org.apache.solr.response.QueryResponseWriterUtil.writeQueryR
> >> >> esponse(QueryResponseWriterUtil.java:65)\r\n\tat
> >> >> org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrC
> >> >> all.java:732)\r\n\tat
> >> >> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:
> >> 473)\r\n\tat
> >> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDisp
> >> >> atchFilter.java:345)\r\n\tat
> >> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDisp
> >> >> atchFilter.java:296)\r\n\tat
> >> >> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilte
> >> >> r(ServletHandler.java:1691)\r\n\tat
> >> >> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHan
> >> >> dler.java:582)\r\n\tat
> >> >> org.eclipse.jetty.server.handler.ScopedHandler.handle(Scoped
> >> >> Handler.java:143)\r\n\tat
> >> >> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHa
> >> >> ndler.java:548)\r\n\tat
> >> >> org.eclipse.jetty.server.session.SessionHandler.doHandle(
> >> >> SessionHandler.java:226)\r\n\tat
> >> >> org.eclipse.jetty.server.handler.ContextHandler.doHandle(
> >> >> ContextHandler.java:1180)\r\n\tat
> >> >> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHand
> >> >> ler.java:512)\r\n\tat
> >> >> org.eclipse.jetty.server.session.SessionHandler.doScope(
> >> >> SessionHandler.java:185)\r\n\tat
> >> >> org.eclipse.jetty.server.handler.ContextHandler.doScope(
> >> >> ContextHandler.java:1112)\r\n\tat
> >> >> org.eclipse.jetty.server.handler.ScopedHandler.handle(Scoped
> >> >> Handler.java:141)\r\n\tat
> >> >> org.eclipse.jetty.server.handler.ContextHandlerCollection.ha
> >> >> ndle(ContextHandlerCollection.java:213)\r\n\tat
> >> >> org.eclipse.jetty.server.handler.HandlerCollection.handle(
> >> >> HandlerCollection.java:119)\r\n\tat
> >> >> org.eclipse.jetty.server.handler.HandlerWrapper.handle(Handl
> >> >> erWrapper.java:134)\r\n\tat
> >> >> org.eclipse.jetty.server.Server.handle(Server.java:534)\r\n\tat
> >> >> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.
> >> java:320)\r\n\tat
> >> >> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConne
> >> >> ction.java:251)\r\n\tat
> >> >> org.eclipse.jetty.io.AbstractConnection$ReadCallback.
> >> >> succeeded(AbstractConnection.java:273)\r\n\tat
> >> >> org.eclipse.jetty.io.FillInterest.fillable(FillInterest.
> >> java:95)\r\n\tat
> >> >> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChann
> >> >> elEndPoint.java:93)\r\n\tat
> >> >> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
> >> >> .executeProduceConsume(ExecuteProduceConsume.java:303)\r\n\tat
> >> >> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
> >> >> .produceConsume(ExecuteProduceConsume.java:148)\r\n\tat
> >> >> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
> >> >> .run(ExecuteProduceConsume.java:136)\r\n\tat
> >> >> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(Queued
> >> >> ThreadPool.java:671)\r\n\tat
> >> >> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedT
> >> >> hreadPool.java:589)\r\n\tat
> >> >> java.lang.Thread.run(Thread.java:745)\r\nCaused by:
> >> >> org.apache.http.MalformedChunkCodingException: CRLF expected at end
> of
> >> >> chunk\r\n\tat
> >> >> org.apache.http.impl.io.ChunkedInputStream.getChunkSize(Chun
> >> >> kedInputStream.java:255)\r\n\tat
> >> >> org.apache.http.impl.io.ChunkedInputStream.nextChunk(Chunked
> >> >> InputStream.java:227)\r\n\tat
> >> >> org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInput
> >> >> Stream.java:186)\r\n\tat
> >> >> org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInput
> >> >> Stream.java:215)\r\n\tat
> >> >> org.apache.http.impl.io.ChunkedInputStream.close(ChunkedInpu
> >> >> tStream.java:316)\r\n\tat
> >> >> org.apache.http.conn.BasicManagedEntity.streamClosed(BasicMa
> >> >> nagedEntity.java:164)\r\n\tat
> >> >> org.apache.http.conn.EofSensorInputStream.checkClose(EofSens
> >> >> orInputStream.java:228)\r\n\tat
> >> >> org.apache.http.conn.EofSensorInputStream.close(EofSensorInp
> >> >> utStream.java:174)\r\n\tat
> >> >> sun.nio.cs.StreamDecoder.implClose(StreamDecoder.java:378)\r\n\tat
> >> >> sun.nio.cs.StreamDecoder.close(StreamDecoder.java:193)\r\n\tat
> >> >> java.io.InputStreamReader.close(InputStreamReader.java:199)\r\n\tat
> >> >> org.apache.solr.client.solrj.io.stream.JSONTupleStream.close
> >> >> (JSONTupleStream.java:92)\r\n\tat
> >> >> org.apache.solr.client.solrj.io.stream.SolrStream.close(Solr
> >> >> Stream.java:193)\r\n\tat
> >> >> org.apache.solr.client.solrj.io.stream.CloudSolrStream.close
> >> >> (CloudSolrStream.java:464)\r\n\tat
> >> >> org.apache.solr.client.solrj.io.stream.HashJoinStream.close(
> >> >> HashJoinStream.java:231)\r\n\tat
> >> >> org.apache.solr.client.solrj.io.stream.ExceptionStream.close
> >> >> (ExceptionStream.java:93)\r\n\tat
> >> >> org.apache.solr.handler.StreamHandler$TimerStream.close(
> >> >> StreamHandler.java:452)\r\n\tat
> >> >> org.apache.solr.client.solrj.io.stream.TupleStream.lambda$wr
> >> >> iteMap$0(TupleStream.java:71)\r\n\t...
> >> >> 40 more\r\n",
> >> >>     "code":500}}
> >> >>
> >> >>
> >> >> Regards,
> >> >> Edwin
> >> >>
> >> >>
> >> >> On 4 May 2017 at 00:00, Joel Bernstein <joelsolr@gmail.com> wrote:
> >> >>
> >> >> > I've reformatted the expression below and made a few changes.
You
> >> have
> >> >> put
> >> >> > things together properly. But these are MapReduce joins that
> require
> >> >> > exporting the entire result sets. So you will need to add
> qt=/export
> >> to
> >> >> all
> >> >> > the searches and remove the rows param. In Solr 6.6. there is
a new
> >> >> > "shuffle" expression that does this automatically.
> >> >> >
> >> >> > To test things you'll want to break down each expression and make
> >> sure
> >> >> it's
> >> >> > behaving as expected.
> >> >> >
> >> >> > For example first run each search. Then run the innerJoin, not
in
> >> >> parallel
> >> >> > mode. Then run it in parallel mode. Then try the whole thing.
> >> >> >
> >> >> > hashJoin(parallel(collection2,
> >> >> >                             innerJoin(search(collection2,
> >> >> >                                                        q=*:*,
> >> >> >
> >> >> >  fl="a_s,b_s,c_s,d_s,e_s",
> >> >> >                                                        sort="a_s
> >> asc",
> >> >> >
> >> >> partitionKeys="a_s",
> >> >> >
> qt="/export"),
> >> >> >                                            search(collection1,
> >> >> >                                                        q=*:*,
> >> >> >
> >> >> >  fl="a_s,f_s,g_s,h_s,i_s,j_s",
> >> >> >                                                        sort="a_s
> >> asc",
> >> >> >
> >> >>  partitionKeys="a_s",
> >> >> >
>  qt="/export"),
> >> >> >                                            on="a_s"),
> >> >> >                              workers="2",
> >> >> >                              sort="a_s asc"),
> >> >> >                hashed=search(collection3,
> >> >> >                                          q=*:*,
> >> >> >                                          fl="a_s,k_s,l_s",
> >> >> >                                          sort="a_s asc",
> >> >> >                                          qt="/export"),
> >> >> >               on="a_s")
> >> >> >
> >> >> > Joel Bernstein
> >> >> > http://joelsolr.blogspot.com/
> >> >> >
> >> >> > On Wed, May 3, 2017 at 11:26 AM, Zheng Lin Edwin Yeo <
> >> >> edwinyeozl@gmail.com
> >> >> > >
> >> >> > wrote:
> >> >> >
> >> >> > > Hi Joel,
> >> >> > >
> >> >> > > Thanks for the clarification.
> >> >> > >
> >> >> > > Would like to check, is this the correct way to do the join?
> >> >> Currently, I
> >> >> > > could not get any results after putting in the hashJoin for
the
> >> 3rd,
> >> >> > > smallerStream collection (collection3).
> >> >> > >
> >> >> > > http://localhost:8983/solr/collection1/stream?expr=
> >> >> > > hashJoin(parallel(collection2
> >> >> > > ,
> >> >> > > innerJoin(
> >> >> > >  search(collection2,
> >> >> > > q=*:*,
> >> >> > > fl="a_s,b_s,c_s,d_s,e_s",
> >> >> > >              sort="a_s asc",
> >> >> > > partitionKeys="a_s",
> >> >> > > rows=200),
> >> >> > >  search(collection1,
> >> >> > > q=*:*,
> >> >> > > fl="a_s,f_s,g_s,h_s,i_s,j_s",
> >> >> > >              sort="a_s asc",
> >> >> > > partitionKeys="a_s",
> >> >> > > rows=200),
> >> >> > >          on="a_s"),
> >> >> > > workers="2",
> >> >> > >                  sort="a_s asc"),
> >> >> > >          hashed=search(collection3,
> >> >> > > q=*:*,
> >> >> > > fl="a_s,k_s,l_s",
> >> >> > > sort="a_s asc",
> >> >> > > rows=200),
> >> >> > > on="a_s")
> >> >> > > &indent=true
> >> >> > >
> >> >> > >
> >> >> > > Regards,
> >> >> > > Edwin
> >> >> > >
> >> >> > >
> >> >> > > On 3 May 2017 at 20:59, Joel Bernstein <joelsolr@gmail.com>
> wrote:
> >> >> > >
> >> >> > > > Sorry, it's just called hashJoin
> >> >> > > >
> >> >> > > > Joel Bernstein
> >> >> > > > http://joelsolr.blogspot.com/
> >> >> > > >
> >> >> > > > On Wed, May 3, 2017 at 2:45 AM, Zheng Lin Edwin Yeo
<
> >> >> > > edwinyeozl@gmail.com>
> >> >> > > > wrote:
> >> >> > > >
> >> >> > > > > Hi Joel,
> >> >> > > > >
> >> >> > > > > I am getting this error when I used the innerHashJoin.
> >> >> > > > >
> >> >> > > > >  "EXCEPTION":"Invalid stream expression
> innerHashJoin(parallel(
> >> >> > > innerJoin
> >> >> > > > >
> >> >> > > > > I also can't find the documentation on innerHashJoin
for the
> >> >> > Streaming
> >> >> > > > > Expressions.
> >> >> > > > >
> >> >> > > > > Are you referring to hashJoin?
> >> >> > > > >
> >> >> > > > > Regards,
> >> >> > > > > Edwin
> >> >> > > > >
> >> >> > > > >
> >> >> > > > > On 3 May 2017 at 13:20, Zheng Lin Edwin Yeo <
> >> edwinyeozl@gmail.com
> >> >> >
> >> >> > > > wrote:
> >> >> > > > >
> >> >> > > > > > Hi Joel,
> >> >> > > > > >
> >> >> > > > > > Thanks for the info.
> >> >> > > > > >
> >> >> > > > > > Regards,
> >> >> > > > > > Edwin
> >> >> > > > > >
> >> >> > > > > >
> >> >> > > > > > On 3 May 2017 at 02:04, Joel Bernstein <joelsolr@gmail.com
> >
> >> >> wrote:
> >> >> > > > > >
> >> >> > > > > >> Also take a look at the documentation
for the "fetch"
> >> streaming
> >> >> > > > > >> expression.
> >> >> > > > > >>
> >> >> > > > > >> Joel Bernstein
> >> >> > > > > >> http://joelsolr.blogspot.com/
> >> >> > > > > >>
> >> >> > > > > >> On Tue, May 2, 2017 at 2:03 PM, Joel Bernstein
<
> >> >> > joelsolr@gmail.com>
> >> >> > > > > >> wrote:
> >> >> > > > > >>
> >> >> > > > > >> > Yes you join more then one collection
with Streaming
> >> >> > Expressions.
> >> >> > > > Here
> >> >> > > > > >> are
> >> >> > > > > >> > a few things to keep in mind.
> >> >> > > > > >> >
> >> >> > > > > >> > * You'll likely want to use the parallel
function around
> >> the
> >> >> > > largest
> >> >> > > > > >> join.
> >> >> > > > > >> > You'll need to use the join keys
as the partitionKeys.
> >> >> > > > > >> > * innerJoin: requires that the streams
be sorted on the
> >> join
> >> >> > keys.
> >> >> > > > > >> > * innerHashJoin: has no sorting requirement.
> >> >> > > > > >> >
> >> >> > > > > >> > So a strategy for a three collection
join might look
> like
> >> >> this:
> >> >> > > > > >> >
> >> >> > > > > >> > innerHashJoin(parallel(innerJoin(bigStream,
> bigStream)),
> >> >> > > > > smallerStream)
> >> >> > > > > >> >
> >> >> > > > > >> > The largest join can be done in parallel
using an
> >> innerJoin.
> >> >> You
> >> >> > > can
> >> >> > > > > >> then
> >> >> > > > > >> > wrap the stream coming out of the
parallel function in
> an
> >> >> > > > > innerHashJoin
> >> >> > > > > >> to
> >> >> > > > > >> > join it to another stream.
> >> >> > > > > >> >
> >> >> > > > > >> >
> >> >> > > > > >> >
> >> >> > > > > >> >
> >> >> > > > > >> >
> >> >> > > > > >> >
> >> >> > > > > >> >
> >> >> > > > > >> >
> >> >> > > > > >> >
> >> >> > > > > >> >
> >> >> > > > > >> >
> >> >> > > > > >> >
> >> >> > > > > >> >
> >> >> > > > > >> >
> >> >> > > > > >> >
> >> >> > > > > >> > Joel Bernstein
> >> >> > > > > >> > http://joelsolr.blogspot.com/
> >> >> > > > > >> >
> >> >> > > > > >> > On Mon, May 1, 2017 at 9:42 PM, Zheng
Lin Edwin Yeo <
> >> >> > > > > >> edwinyeozl@gmail.com>
> >> >> > > > > >> > wrote:
> >> >> > > > > >> >
> >> >> > > > > >> >> Hi,
> >> >> > > > > >> >>
> >> >> > > > > >> >> Is it possible to join more than
2 collections using
> one
> >> of
> >> >> the
> >> >> > > > > >> streaming
> >> >> > > > > >> >> expressions (Eg: innerJoin)?
If not, is there other
> ways
> >> we
> >> >> can
> >> >> > > do
> >> >> > > > > it?
> >> >> > > > > >> >>
> >> >> > > > > >> >> Currently, I may need to join
3 or 4 collections
> >> together,
> >> >> and
> >> >> > to
> >> >> > > > > >> output
> >> >> > > > > >> >> selected fields from all these
collections together.
> >> >> > > > > >> >>
> >> >> > > > > >> >> I'm using Solr 6.4.2.
> >> >> > > > > >> >>
> >> >> > > > > >> >> Regards,
> >> >> > > > > >> >> Edwin
> >> >> > > > > >> >>
> >> >> > > > > >> >
> >> >> > > > > >> >
> >> >> > > > > >>
> >> >> > > > > >
> >> >> > > > > >
> >> >> > > > >
> >> >> > > >
> >> >> > >
> >> >> >
> >> >>
> >> >
> >> >
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message