manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: [VOTE] Release ManifoldCF 2.11, RC1
Date Fri, 21 Sep 2018 15:41:03 GMT
I posted to dev@lucene.apache.org describing the problem, and committed a
fix that allowed the integration test to pass. Basically, if the URL
required is more than 4000 characters, it will use multipart post.
Otherwise it will do whatever SolrJ wants.

I am still very concerned that there are a number of fixes we needed to add
to SolrJ to make it work with our setup. One case I know that will not work
is the multipart form's name field, which cannot be transmitted to Solr
Cell through a standard URL. Thus my hack is going to break this
functionality. I expect it will impact folks like Shinichiro Abe
<https://issues.apache.org/jira/secure/ViewProfile.jspa?name=shinichiro+abe>,
because they have relied on this in the past. Unfortunately I know of no
other workaround at this time, so the release will be postponed further
until we find one.


Hopefully this will be something the Solr team is willing to address
promptly.  If not, in the interim, I've loaded another version of MCF 2.11
onto the release area so that people may use it if they wish, given that we
may need to wait several more months before we can get 2.11 out the door.


Karl



On Fri, Sep 21, 2018 at 9:35 AM Julien Massiera <
julien.massiera@francelabs.com> wrote:

> Hi Karl,
>
> I understand that the piece of code involved is exactly the same as the
> one in the SolrJ API, which is the "reference" way of coding.
>
> Let me explain again the different steps of my tests :
>
> 1) I configured a job to crawl a winshare repository containing 3 files
> and ingesting them into a Solr 7.4.0 instance
>
> 2) The job ran and ended with a 'Done' status and the number of
> processed documents was correct.
>
> 3) I checked the number of documents of my Solr instance and noticed
> that it was 0
>
> 4) I checked the Simple history of MCF and found the following error for
> each of my 3 documents :
>
> 09-21-2018 11:49:09.362         document ingest (Solr)
> file://///localhost/OCR/subfolder/test_file.txt
>         400     61      118749  Error from server at
> http://localhost:8983/solr/FileShare: missing content stream
>
>
> 5) I then checked the logs of Solr and found the following error for
> each of the document ingestions :
>
> ERROR 2018-09-21T11:51:04,100 (qtp952486988-21) -
> Solr|Solr|solr.handler.RequestHandlerBase|[c:FileShare s:shard1
> r:core_node2 x:FileShare_shard1_replica_n1] o.a.s.h.RequestHandlerBase
> org.apache.solr.common.SolrException: missing content stream
>      at
>
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:63)
>      at
>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
>      at org.apache.solr.core.SolrCore.execute(SolrCore.java:2539)
>      at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:709)
>      at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:515)
>      at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)
>      at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)
>      at
>
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)
>      at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
>      at
>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
>      at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>      at
>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>      at
>
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
>      at
>
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)
>      at
>
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
>      at
>
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253)
>      at
>
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
>      at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
>      at
>
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)
>      at
>
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
>      at
>
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155)
>      at
>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
>      at
>
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)
>      at
>
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
>      at
>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>      at
>
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
>      at
>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>      at org.eclipse.jetty.server.Server.handle(Server.java:531)
>      at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)
>      at
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)
>      at
> org.eclipse.jetty.io
> .AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)
>      at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)
>      at org.eclipse.jetty.io
> .ChannelEndPoint$2.run(ChannelEndPoint.java:118)
>      at
>
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
>      at
>
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
>      at
>
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
>      at
>
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
>      at
>
> org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
>      at
>
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:760)
>      at
>
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:678)
>      at java.lang.Thread.run(Thread.java:748)
>
> 6) I did a new crawl to debug the code and found that after the
> following lines (in the
> org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient:108) :
>      SolrParams params = request.getParams();
>      RequestWriter.ContentWriter contentWriter =
> requestWriter.getContentWriter(request);
>      Collection<ContentStream> streams = contentWriter == null ?
> requestWriter.getContentStreams(request) : null;
>
>      the 'streams' object is null
>
>      So I checked the value of the contentWriter object and found that
> it was not null. So it explains why the if statement attributed the null
> value to the 'streams' object instead of the
> requestWriter.getContentStreams(request) which, after checking it, is
> correctly returning a ContentStream collection containing the input
> stream of the incoming file.
>
>
> In conclusion, I am as confused as you and, knowing that you used the
> same piece of code than the SolrJ API, I am wondering wether we should
> ask them some explanation ?
>
> Julien
>
> On 21/09/2018 15:04, Karl Wright wrote:
> > Hi Julien,
> >
> > I verified that the integration test in question confirms the following:
> > (a) that the right number of documents were processed, and that (b) there
> > were no errors reported during the processing.  So unless the failure is
> > indeed a silent one, and documents are simply not getting transmitted to
> > Solr at all, that test should be valid.
> >
> > Can you describe the actual failure that you are seeing please?
> >
> > Karl
> >
> >
> > On Fri, Sep 21, 2018 at 8:52 AM Karl Wright <daddywri@gmail.com> wrote:
> >
> >> Julien,
> >>
> >> Integration tests do cover indexing via SolrJ, and they do succeed.
> >> (That's how I found the deletion bug FWIW).  I therefore need more
> >> information about the specific failure symptom you are seeing before
> I'll
> >> withdraw the candidate.  If it's a silent failure that's one thing but
> if
> >> you are are seeing a ManifoldCF exception then something is different
> >> between your setup and mine.
> >>
> >> Karl
> >>
> >>
> >> On Fri, Sep 21, 2018 at 8:09 AM Julien Massiera <
> >> julien.massiera@francelabs.com> wrote:
> >>
> >>> -1 ref : https://issues.apache.org/jira/browse/CONNECTORS-1533
> >>>
> >>> Julien
> >>>
> >>>
> >>> On 20/09/2018 10:38, Karl Wright wrote:
> >>>> All tests pass, artifacts look good.
> >>>>
> >>>> +1 from me.
> >>>>
> >>>> Karl
> >>>>
> >>>>
> >>>> On Wed, Sep 19, 2018 at 9:57 PM Karl Wright <daddywri@gmail.com>
> wrote:
> >>>>
> >>>>> Please vote on whether to release ManifoldCF 2.11, RC1.  This release
> >>>>> contains a number of fixes/improvements/additions, described in
the
> >>>>> CHANGES.txt file.  In addition, it includes Tika 1.19, which has
a
> >>> number
> >>>>> of fixes for classpath issues specifically requested by ManifoldCF.
> >>>>>
> >>>>> This fixes a SolrJ related problem with the Solr Connector found
in
> >>> RC1.
> >>>>> All tests pass.
> >>>>>
> >>>>> The release artifact can be found at:
> >>>>>
> >>>>>
> >>>
> https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-2.11
> >>>>> There is also a tag at:
> >>>>>
> >>>>> https://svn.apache.org/repos/asf/manifoldcf/tags/release-2.11-RC1
> >>>>>
> >>>>> Thanks again,
> >>>>> Karl Wright
> >>>>>
> >>>>>
> >>> --
> >>> Julien MASSIERA
> >>> Directeur développement produit
> >>> France Labs – Les experts du Search
> >>> Retrouvez-nous à l’Enterprise Search & Discovery Summit à Washington
DC
> >>> www.francelabs.com
> >>>
> >>>
>
> --
> Julien MASSIERA
> Directeur développement produit
> France Labs – Les experts du Search
> Retrouvez-nous à l’Enterprise Search & Discovery Summit à Washington DC
> www.francelabs.com
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message