lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joel Bernstein <joels...@gmail.com>
Subject Re: solr /export handler - behavior during close()
Date Mon, 15 May 2017 13:47:06 GMT
Ok please do report any issues you run into. This is quite a good bug
report.

I reviewed the code and I believe I see the problem. The problem seems to
be that output code from the /stream handler is not properly accounting for
client disconnects and closing the underlying stream. What I see in the
code is that exceptions coming from read() in the stream do automatically
close the underlying stream. But exceptions from the writing of the stream
do not close the stream. This needs to be fixed.

A few questions about your streaming implementation:

1) Are you sending requests to the /stream handler? Or are you embedding
CloudSolrStream in your application and bypassing the /stream handler?

2) If you're sending Streaming Expressions to the stream handler are you
using SolrStream or CloudSolrStream to send the expression?

3) What version of Solr are you using.

4) Have you implemented any custom streams?


#2 is an important question. If you're sending expressions to the /stream
handler using CloudSolrStream the collection running the expression would
have to be setup a specific way. The collection running the expression will
have to be a* single shard collection*. You can have as many replicas as
you want but only one shard. That's because CloudSolrStream picks one
replica in each shard to forward the request to then merges the results
from the shards. So if you send in an expression using CloudSolrStream that
expression will be sent to each shard to be run and each shard will be
duplicating the work and return duplicate results.




















Joel Bernstein
http://joelsolr.blogspot.com/

On Sat, May 13, 2017 at 7:03 PM, Susmit Shukla <shukla.susmit@gmail.com>
wrote:

> Thanks Joel
> Streaming is awesome, just had a huge implementation in my project. I found
> out a couple more issues with streaming and did local hacks for them, would
> raise them too.
>
> On Sat, May 13, 2017 at 2:09 PM, Joel Bernstein <joelsolr@gmail.com>
> wrote:
>
> > Ah, then this is unexpected behavior. Can you open a ticket for this?
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Sat, May 13, 2017 at 2:51 PM, Susmit Shukla <shukla.susmit@gmail.com>
> > wrote:
> >
> > > Hi Joel,
> > >
> > > I was using CloudSolrStream for the above test. Below is the call
> stack.
> > >
> > > at
> > > org.apache.http.impl.io.ChunkedInputStream.read(
> > > ChunkedInputStream.java:215)
> > > at
> > > org.apache.http.impl.io.ChunkedInputStream.close(
> > > ChunkedInputStream.java:316)
> > > at
> > > org.apache.http.impl.execchain.ResponseEntityProxy.streamClosed(
> > > ResponseEntityProxy.java:128)
> > > at
> > > org.apache.http.conn.EofSensorInputStream.checkClose(
> > > EofSensorInputStream.java:228)
> > > at
> > > org.apache.http.conn.EofSensorInputStream.close(
> > > EofSensorInputStream.java:174)
> > > at sun.nio.cs.StreamDecoder.implClose(StreamDecoder.java:378)
> > > at sun.nio.cs.StreamDecoder.close(StreamDecoder.java:193)
> > > at java.io.InputStreamReader.close(InputStreamReader.java:199)
> > > at
> > > org.apache.solr.client.solrj.io.stream.JSONTupleStream.
> > > close(JSONTupleStream.java:91)
> > > at
> > > org.apache.solr.client.solrj.io.stream.SolrStream.close(
> > > SolrStream.java:186)
> > >
> > > Thanks,
> > > Susmit
> > >
> > > On Sat, May 13, 2017 at 10:48 AM, Joel Bernstein <joelsolr@gmail.com>
> > > wrote:
> > >
> > > > I was just reading the Java docs on the ChunkedInputStream.
> > > >
> > > > "Note that this class NEVER closes the underlying stream"
> > > >
> > > > In that scenario the /export would indeed continue to send data. I
> > think
> > > we
> > > > can consider this an anti-pattern for the /export handler currently.
> > > >
> > > > I would suggest using one of the Streaming Clients to connect to the
> > > export
> > > > handler. Either CloudSolrStream or SolrStream will both interact with
> > the
> > > > /export handler in a the way that it expects.
> > > >
> > > >
> > > > Joel Bernstein
> > > > http://joelsolr.blogspot.com/
> > > >
> > > > On Sat, May 13, 2017 at 12:28 PM, Susmit Shukla <
> > shukla.susmit@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Hi Joel,
> > > > >
> > > > > I did not observe that. On calling close() on stream, it cycled
> > through
> > > > all
> > > > > the hits that /export handler calculated.
> > > > > e.g. with a *:* query and export handler on a 100k document index,
> I
> > > > could
> > > > > see the 100kth record printed on the http wire debug log although
> > close
> > > > was
> > > > > called after reading 1st tuple. The time taken for the operation
> with
> > > > > close() call was same as that if I had read all the 100k tuples.
> > > > > As I have pointed out, close() on underlying ChunkedInputStream
> calls
> > > > > read() and solr server has probably no way to distinguish it from
> > > read()
> > > > > happening from regular tuple reads..
> > > > > I think there should be an abort() API for solr streams that hooks
> > into
> > > > > httpmethod.abort() . That would enable client to disconnect early
> and
> > > > > probably that would disconnect the underlying socket so there would
> > be
> > > no
> > > > > leaks.
> > > > >
> > > > > Thanks,
> > > > > Susmit
> > > > >
> > > > >
> > > > > On Sat, May 13, 2017 at 7:42 AM, Joel Bernstein <
> joelsolr@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > If the client closes the connection to the export handler then
> this
> > > > > > exception will occur automatically on the server.
> > > > > >
> > > > > > Joel Bernstein
> > > > > > http://joelsolr.blogspot.com/
> > > > > >
> > > > > > On Sat, May 13, 2017 at 1:46 AM, Susmit Shukla <
> > > > shukla.susmit@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Joel,
> > > > > > >
> > > > > > > Thanks for the insight. How can this exception be thrown/forced
> > > from
> > > > > > client
> > > > > > > side. Client can't do a System.exit() as it is running
as a
> > webapp.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Susmit
> > > > > > >
> > > > > > > On Fri, May 12, 2017 at 4:44 PM, Joel Bernstein <
> > > joelsolr@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > In this scenario the /export handler continues to
export
> > results
> > > > > until
> > > > > > it
> > > > > > > > encounters a "Broken Pipe" exception. This exception
is
> trapped
> > > and
> > > > > > > ignored
> > > > > > > > rather then logged as it's not considered an exception
if the
> > > > client
> > > > > > > > disconnects early.
> > > > > > > >
> > > > > > > > Joel Bernstein
> > > > > > > > http://joelsolr.blogspot.com/
> > > > > > > >
> > > > > > > > On Fri, May 12, 2017 at 2:10 PM, Susmit Shukla <
> > > > > > shukla.susmit@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > I have a question regarding solr /export handler.
Here is
> the
> > > > > > scenario
> > > > > > > -
> > > > > > > > > I want to use the /export handler - I only need
sorted data
> > and
> > > > > this
> > > > > > is
> > > > > > > > the
> > > > > > > > > fastest way to get it. I am doing multiple level
joins
> using
> > > > > streams
> > > > > > > > using
> > > > > > > > > /export handler. I know the number of top level
records to
> be
> > > > > > retrieved
> > > > > > > > but
> > > > > > > > > not for each individual stream rolling up to
the final
> > result.
> > > > > > > > > I observed that calling close() on a /export
stream is too
> > > > > expensive.
> > > > > > > It
> > > > > > > > > reads the stream to the very end of hits. Assuming
there
> are
> > > 100
> > > > > > > million
> > > > > > > > > hits for each stream ,first 1k records were found
after
> joins
> > > and
> > > > > we
> > > > > > > call
> > > > > > > > > close() after that, it would take many minutes/hours
to
> > finish
> > > > it.
> > > > > > > > > Currently I have put close() call in a different
thread -
> > > > basically
> > > > > > > fire
> > > > > > > > > and forget. But the cluster is very strained
because of the
> > > > > > > unneccessary
> > > > > > > > > reads.
> > > > > > > > >
> > > > > > > > > Internally streaming uses ChunkedInputStream
of HttpClient
> > and
> > > it
> > > > > has
> > > > > > > to
> > > > > > > > be
> > > > > > > > > drained in the close() call. But from server
point of view,
> > it
> > > > > should
> > > > > > > > stop
> > > > > > > > > sending more data once close() has been issued.
> > > > > > > > > There is a read() call in close() method of
> > ChunkedInputStream
> > > > that
> > > > > > is
> > > > > > > > > indistinguishable from real read(). If /export
handler
> stops
> > > > > sending
> > > > > > > more
> > > > > > > > > data after close it would be very useful.
> > > > > > > > >
> > > > > > > > > Another option would be to use /select handler
and get into
> > > > > business
> > > > > > of
> > > > > > > > > managing a custom cursor mark that is based on
the stream
> > sort
> > > > and
> > > > > is
> > > > > > > > reset
> > > > > > > > > until it fetches the required records at topmost
level.
> > > > > > > > >
> > > > > > > > > Any thoughts.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Susmit
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message