lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zheng Lin Edwin Yeo <edwinye...@gmail.com>
Subject Re: Multiple hashJoin or innerJoin
Date Mon, 19 Jun 2017 03:32:48 GMT
Hi Joel,

Yes, I have tried the hashJoin. This didn't give the timeout. If I used the
/select handler, I will get 1000 records returned, and if I use the /export
handler, there are too many records for the browser to display. The current
count is 280,000.

As I have many fields (more than 100 fields each) in all the collections
that are being joined, so is that the reasons that the join will cause the
number of records to "blow up"? I am only expecting less than 100 records
to be returned based on the filter of the ID. This also happens when I do
only a single hashJoin to join 2 collections.

Regards,
Edwin


On 19 June 2017 at 08:15, Joel Bernstein <joelsolr@gmail.com> wrote:

> About the timeout error. One thing to look at is with the inner join below:
>
> innerJoin(innerJoin(
>   search(people, q=*:*, fl="personId,name", sort="personId asc"),
>   search(pets, q=type:cat, fl="pertsId,petName", sort="personId asc"),
>   on="personId=petsId"
> )
>
> If the left side of the join is very large and the right side of join is
> much smaller, the right side of join could timeout. This is because the
> right side of join may spend a significant amount of time blocked waiting
> for the left side of the join to stream it's records.
>
> Try using a hashJoin in this scenario. In general the hashJoin is always
> the right choice when one side can fit in memory.
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Sun, Jun 18, 2017 at 5:16 PM, Joel Bernstein <joelsolr@gmail.com>
> wrote:
>
> > The search expressions don't appear to be using the /export handler.
> > Streaming joins require the export hander because all the results that
> > match the query need to be considered in the join.
> >
> > When debugging these types of multi-collection joins you need to build up
> > the expression piece by piece. First simply run the searches individually
> > and see how long they take to fully export. You can do this by using curl
> > to run the search expressions and saving all the records to a file.
> >
> > Then run a single join using curl and save the records to file. Once you
> > get that working try the three joins together.
> >
> > You'll use this same approach when improving performance of the join. You
> > look at the performance of each part of expression and improve
> performance
> > in places where the expression is slow.
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Sat, Jun 17, 2017 at 12:00 AM, Zheng Lin Edwin Yeo <
> > edwinyeozl@gmail.com> wrote:
> >
> >> This is the full error message from the Node for the second example,
> which
> >> is the following query that get stucked.
> >>
> >> innerJoin(innerJoin(
> >>   search(people, q=*:*, fl="personId,name", sort="personId asc"),
> >>   search(pets, q=type:cat, fl="pertsId,petName", sort="personId asc"),
> >>   on="personId=petsId"
> >> ),
> >>   search(collection1, q=*:*, fl="collectionId,collectionName",
> >> sort="collectionId asc"),
> >> )on="personId=collectionId"
> >>
> >>
> >> ------------------------------------------------------------
> >> ------------------------------------------------------
> >> Full error message:
> >>
> >> java.io.IOException: java.util.concurrent.TimeoutException: Idle
> timeout
> >> expired
> >> : 50000/50000 ms
> >>         at
> >> org.eclipse.jetty.util.SharedBlockingCallback$Blocker.block(SharedBlo
> >> ckingCallback.java:219)
> >>         at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.
> java:220)
> >>         at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.
> java:496)
> >>         at
> >> org.apache.commons.io.output.ProxyOutputStream.write(ProxyOutputStrea
> >> m.java:90)
> >>         at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
> >>         at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282)
> >>         at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)
> >>         at java.io.OutputStreamWriter.write(OutputStreamWriter.java:
> 207)
> >>         at org.apache.solr.util.FastWriter.flush(FastWriter.java:140)
> >>         at org.apache.solr.util.FastWriter.write(FastWriter.java:54)
> >>         at
> >> org.apache.solr.response.JSONWriter.writeStr(JSONResponseWriter.java:
> >> 482)
> >>         at
> >> org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWrit
> >> er.java:132)
> >>         at
> >> org.apache.solr.response.JSONWriter$2.put(JSONResponseWriter.java:559
> >> )
> >>         at
> >> org.apache.solr.handler.ExportWriter$StringFieldWriter.write(ExportWr
> >> iter.java:1445)
> >>         at
> >> org.apache.solr.handler.ExportWriter.writeDoc(ExportWriter.java:302)
> >>         at
> >> org.apache.solr.handler.ExportWriter.lambda$writeDocs$4(ExportWriter.
> >> java:268)
> >>         at
> >> org.apache.solr.response.JSONWriter.writeMap(JSONResponseWriter.java:
> >> 547)
> >>         at
> >> org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWrit
> >> er.java:193)
> >>         at
> >> org.apache.solr.response.JSONWriter$1.add(JSONResponseWriter.java:532
> >> )
> >>         at
> >> org.apache.solr.handler.ExportWriter.writeDocs(ExportWriter.java:267)
> >>
> >>         at
> >> org.apache.solr.handler.ExportWriter.lambda$null$1(ExportWriter.java:
> >> 219)
> >>         at
> >> org.apache.solr.response.JSONWriter.writeIterator(JSONResponseWriter.
> >> java:523)
> >>         at
> >> org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWrit
> >> er.java:175)
> >>         at
> >> org.apache.solr.response.JSONWriter$2.put(JSONResponseWriter.java:559
> >> )
> >>         at
> >> org.apache.solr.handler.ExportWriter.lambda$null$2(ExportWriter.java:
> >> 219)
> >>         at
> >> org.apache.solr.response.JSONWriter.writeMap(JSONResponseWriter.java:
> >> 547)
> >>         at
> >> org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWrit
> >> er.java:193)
> >>         at
> >> org.apache.solr.response.JSONWriter$2.put(JSONResponseWriter.java:559
> >> )
> >>         at
> >> org.apache.solr.handler.ExportWriter.lambda$write$3(ExportWriter.java
> >> :217)
> >>         at
> >> org.apache.solr.response.JSONWriter.writeMap(JSONResponseWriter.java:
> >> 547)
> >>         at org.apache.solr.handler.ExportWriter.write(ExportWriter.
> >> java:215)
> >>         at org.apache.solr.core.SolrCore$3.write(SolrCore.java:2564)
> >>         at
> >> org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(Q
> >> ueryResponseWriterUtil.java:49)
> >>         at
> >> org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:
> >> 809)
> >>         at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:
> >> 538)
> >>         at
> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte
> >> r.java:347)
> >>         at
> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte
> >> r.java:298)
> >>         at
> >> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(Servlet
> >> Handler.java:1691)
> >>         at
> >> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java
> >> :582)
> >>         at
> >> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.j
> >> ava:143)
> >>         at
> >> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.jav
> >> a:548)
> >>         at
> >> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandl
> >> er.java:226)
> >>         at
> >> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandl
> >> er.java:1180)
> >>         at
> >> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:
> >> 512)
> >>         at
> >> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandle
> >> r.java:185)
> >>         at
> >> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandle
> >> r.java:1112)
> >>         at
> >> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.j
> >> ava:141)
> >>         at
> >> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(Cont
> >> extHandlerCollection.java:213)
> >>         at
> >> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerColl
> >> ection.java:119)
> >>         at
> >> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper
> >> .java:134)
> >>         at
> >> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandle
> >> r.java:335)
> >>         at
> >> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper
> >> .java:134)
> >>         at org.eclipse.jetty.server.Server.handle(Server.java:534)
> >>         at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.
> >> java:320)
> >>         at
> >> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.jav
> >> a:251)
> >>         at
> >> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(Abstra
> >> ctConnection.java:273)
> >>         at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.
> >> java:95)
> >>         at
> >> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoin
> >> t.java:93)
> >>         at
> >> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeP
> >> roduceConsume(ExecuteProduceConsume.java:303)
> >>         at
> >> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceC
> >> onsume(ExecuteProduceConsume.java:148)
> >>         at
> >> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(Exec
> >> uteProduceConsume.java:136)
> >>         at
> >> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPoo
> >> l.java:671)
> >>         at
> >> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool
> >> .java:589)
> >>         at java.lang.Thread.run(Thread.java:745)
> >> Caused by: java.util.concurrent.TimeoutException: Idle timeout expired:
> >> 50000/50
> >> 000 ms
> >>         at
> >> org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:16
> >> 6)
> >>         at org.eclipse.jetty.io.IdleTimeout$1.run(IdleTimeout.java:50)
> >>         at
> >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:51
> >> 1)
> >>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> >>         at
> >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.
> >> access$201(ScheduledThreadPoolExecutor.java:180)
> >>         at
> >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.
> >> run(ScheduledThreadPoolExecutor.java:293)
> >>         at
> >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
> >> java:1142)
> >>         at
> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
> >> .java:617)
> >>         ... 1 more
> >>
> >> Regards,
> >> Edwin
> >>
> >>
> >> On 17 June 2017 at 11:53, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com>
> >> wrote:
> >>
> >> > Hi Joel,
> >> >
> >> > Below are the results which I am getting.
> >> >
> >> > If I use this query;
> >> >
> >> > innerJoin(innerJoin(
> >> >   search(people, q=*:*, fl="personId,name", sort="personId asc"),
> >> >   search(pets, q=type:cat, fl="pertsId,petName", sort="personId asc"),
> >> >   on="personId=petsId"
> >> > ),
> >> >   search(collection1, q=*:*, fl="collectionId,collectionName",
> >> > sort="collectionId asc"),
> >> > )on="petsId=collectionId"
> >> >
> >> > I will get this exception error.
> >> >
> >> > {"result-set":{"docs":[{"EXCEPTION":"Invalid JoinStream - all
> incoming
> >> stream comparators (sort) must be a superset of this stream's
> >> equalitor.","EOF":true}]}}
> >> >
> >> >
> >> >
> >> > But if I use this query:
> >> >
> >> > innerJoin(innerJoin(
> >> >   search(people, q=*:*, fl="personId,name", sort="personId asc"),
> >> >   search(pets, q=type:cat, fl="pertsId,petName", sort="personId asc"),
> >> >   on="personId=petsId"
> >> > ),
> >> >   search(collection1, q=*:*, fl="collectionId,collectionName",
> >> > sort="collectionId asc"),
> >> > )on="personId=collectionId"
> >> >
> >> > The query will get stuck, until I get this message. After which, the
> >> whole
> >> > Solr is hanged, and I have to restart Solr to get it working again.
> >> This is
> >> > in Solr 6.5.1.
> >> >
> >> > 2017-06-17 03:16:00.916 WARN  (zkCallback-8-thread-4-
> >> > processing-n:192.168.0.1:8983_solr x:collection1_shard1_replica1
> >> s:shard1
> >> > c:collection1 r:core_node1-EventThread) [c:collection1 s:shard1
> >> > r:core_node1 x:collection1_shard1_replica1]
> o.a.s.c.c.ConnectionManager
> >> Our
> >> > previous ZooKeeper session was expired. Attempting to reconnect to
> >> recover
> >> > relationship with ZooKeeper...
> >> >
> >> >
> >> > Regards,
> >> > Edwin
> >> >
> >> >
> >> > On 15 June 2017 at 23:36, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com>
> >> > wrote:
> >> >
> >> >> Hi Joel,
> >> >>
> >> >> Yes, I got this error:
> >> >>
> >> >> {"result-set":{"docs":[{"EXCEPTION":"Invalid JoinStream - all
> >> incoming stream comparators (sort) must be a superset of this stream's
> >> equalitor.","EOF":true}]}}
> >> >>
> >> >>
> >> >> Ok, will try out the work around first.
> >> >>
> >> >> Regards,
> >> >> Edwin
> >> >>
> >> >>
> >> >> On 15 June 2017 at 20:16, Joel Bernstein <joelsolr@gmail.com>
wrote:
> >> >>
> >> >>> It looks like you are running into this bug:
> >> >>> https://issues.apache.org/jira/browse/SOLR-10512. This not been
> >> resolved
> >> >>> yet, but I believe there is a work around which is described in
the
> >> >>> ticket.
> >> >>>
> >> >>> Joel Bernstein
> >> >>> http://joelsolr.blogspot.com/
> >> >>>
> >> >>> On Wed, Jun 14, 2017 at 10:09 PM, Zheng Lin Edwin Yeo <
> >> >>> edwinyeozl@gmail.com>
> >> >>> wrote:
> >> >>>
> >> >>> > I have found that this is possible, but currently I have problems
> if
> >> >>> the
> >> >>> > field name to join in all the 3 collections are different.
> >> >>> >
> >> >>> > For example, if in "people" collection, it is called personId,
and
> >> in
> >> >>> > "pets" collection, it is called petsId. But in "collectionId",
it
> is
> >> >>> called
> >> >>> > collectionName, but it won't work when I place it this way
below.
> >> Any
> >> >>> > suggestions on how I can handle this?
> >> >>> >
> >> >>> > innerJoin(innerJoin(
> >> >>> >   search(people, q=*:*, fl="personId,name", sort="personId
asc"),
> >> >>> >   search(pets, q=type:cat, fl="pertsId,petName", sort="personId
> >> asc"),
> >> >>> >   on="personId=petsId"
> >> >>> > ),
> >> >>> >   search(collection1, q=*:*, fl="collectionId,collectionName",
> >> >>> > sort="personId asc"),
> >> >>> > )on="personId=collectionId"
> >> >>> >
> >> >>> >
> >> >>> > Regards,
> >> >>> > Edwin
> >> >>> >
> >> >>> > On 14 June 2017 at 23:13, Zheng Lin Edwin Yeo <
> edwinyeozl@gmail.com
> >> >
> >> >>> > wrote:
> >> >>> >
> >> >>> > > Hi,
> >> >>> > >
> >> >>> > > I'm using Solr 6.5.1.
> >> >>> > >
> >> >>> > > Is it possible to have multiple hashJoin or innerJoin
in the
> >> query?
> >> >>> > >
> >> >>> > > An example will be something like this for innerJoin:
> >> >>> > >
> >> >>> > > innerJoin(innerJoin(
> >> >>> > >   search(people, q=*:*, fl="personId,name", sort="personId
> asc"),
> >> >>> > >   search(pets, q=type:cat, fl="personId,petName", sort="personId
> >> >>> asc"),
> >> >>> > >   on="personId"
> >> >>> > > ),
> >> >>> > >   search(collection1, q=*:*, fl="personId,personName",
> >> sort="personId
> >> >>> > > asc"),
> >> >>> > > )
> >> >>> > >
> >> >>> > > Regards,
> >> >>> > > Edwin
> >> >>> > >
> >> >>> >
> >> >>>
> >> >>
> >> >>
> >> >
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message