manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Internal server error (500) causing a crawl interruption
Date Mon, 20 Oct 2014 16:59:32 GMT
Hi Kamil,

Yes, of course Tika has bugs.  But ManifoldCF uses Tika in a different way
than Solr, as I explained -- and in this case I expect that if you just use
ManifoldCF's Tika extractor instead, you will not see a problem.

Basically, it's processing text:

        at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:88)

... and appending it, in memory, to a buffer:

        at java.lang.StringBuilder.append(Unknown Source)
        at
org.apache.solr.handler.extraction.SolrContentHandler.characters(SolrContentHandler.java:303)

How is this NOT a solr problem?  (Remember that it is the size of the
extracted content that matters, not the raw size of the document before
extraction.)

My recommendation is therefore the following:

-- Use the Tika extractor in ManifoldCF
-- Limit the number of bytes that your Solr connection will accept to
something that can fit in your Solr instance's memory comfortably, even
when accounting for ALL simultaneous indexing connections

Karl


On Mon, Oct 20, 2014 at 12:45 PM, Kamil Żyta <kamil.zyta@pwr.edu.pl> wrote:

> Documents size isn't a solution. It happens for 200MB file, larger file
> Solr extract well. Solr blame Tika, Tika has bugs:
> https://issues.apache.org/jira/browse/TIKA-1388
> Solr and ManifoldCF use Tika 1.5, new versions will use Tika 1.6.
> Each versions of Tika will have errors. In my opinion ManifoldCF should
> handle Solr errors so that the task is ended at a decent time.
>
> Regards,
> KŻ
>
>
> On Mon, Oct 20, 2014 at 12:27:16PM -0400, Karl Wright wrote:
> >    Well, that's clear enough:
> >    ERROR - 2014-10-20 10:54:00.355; org.apache.solr.common.SolrException;
> >    null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Requested
> >    array size exceeds VM limit
> >    OutOfMemoryExceptions are never fully survivable, because in a
> >    multithreaded environment, other threads may suffer too from a memory
> >    restriction of this kind.
> >    The solution: Either address the problem on the Solr side, or you can
> >    limit the maximum size of documents being indexed by using a Document
> >    Filter transformer.  Another solution is, as you suggest, using the
> Tika
> >    Extractor in ManifoldCF, which is set up to stream data through Tika
> >    rather than put it all in memory.  But you will still need a maximum
> >    document size limit even then, because when you aren't using the
> SolrCell
> >    (ExtractingUpdateHandler) approach for Solr, SolrJ loads the entire
> >    document into memory on the ManifoldCF side.  So you are probably
> better
> >    off just establishing the limit.
> >    Thanks,
> >    Karl
> >    On Mon, Oct 20, 2014 at 12:19 PM, Kamil Żyta <[1]
> kamil.zyta@pwr.edu.pl>
> >    wrote:
> >
> >      [2]http://pastebin.com/AWkgVeUh
> >
> >      K
> >      On Mon, Oct 20, 2014 at 12:13:42PM -0400, Karl Wright wrote:
> >      >    Can you provide the solr exception, from the solr log?
> >      >    Karl
> >      >    On Mon, Oct 20, 2014 at 12:11 PM, Kamil Żyta
> >      <[1][3]kamil.zyta@pwr.edu.pl>
> >      >    wrote:
> >      >
> >      >      Hi,
> >      >      I have some bad files too and get 500 errors from Solr,
> tested on
> >      >      Solr stable and trunk (Tika 1.5, 1.6). ManifoldCF job hang
> and
> >      never
> >      >      end.
> >      >      ManifoldCF have 'Transformation Connections' where I added
> Tika
> >      >      extractor.
> >      >      How this works? It's only metadata extraction or mime
> detection?
> >      >      If manifoldCF had complete Tika extraction it would had
> better
> >      handle
> >      >      Tika
> >      >      errors.
> >      >
> >      >      Regards,
> >      >      KŻ
> >      >      On Mon, Oct 20, 2014 at 06:15:52AM -0400, Karl Wright wrote:
> >      >      >    Hi Luca,
> >      >      >    I am sorry, but we only get back a 500 error from Solr,
> and
> >      that is
> >      >      not
> >      >      >    enough information to determine that Tika failed.
> Having a
> >      general
> >      >      policy
> >      >      >    of ignoring 500 errors, which occur when *any* solr
> >      exception is
> >      >      thrown,
> >      >      >    seems like a bad idea to me.  Indeed, I am concerned
> that it
> >      is not
> >      >      a Tika
> >      >      >    failure that you are seeing, but rather something like
> Solr
> >      running
> >      >      out of
> >      >      >    memory, which should definitely never be ignored.
> >      >      >    You can tell by looking at the actual exception Solr
> logs to
> >      >      determine
> >      >      >    what the underlying cause is.
> >      >      >    Thanks,
> >      >      >    Karl
> >      >      >    On Mon, Oct 20, 2014 at 5:00 AM, Basso Luca
> >      >      >    <[1][2][4]LBasso@regione.emilia-romagna.it> wrote:
> >      >      >
> >      >      >      Hi Shinichiro,
> >      >      >      we found the right configuration just before your
> >      suggestion.
> >      >      >      Thank you!
> >      >      >
> >      >      >      Nevertheless, applying "ignoreTikaException" reduces
> >      somewhat the
> >      >      >      problem but doesn't resolve it completely.
> >      >      >      Specifically, the problem still persist for some pdf
> files
> >      (not
> >      >      only for
> >      >      >      scanned pdf and/or pdf converted from ms-office
> >      documents).
> >      >      >      Given that the Tika project is not resolving this
> issue,
> >      we
> >      >      suggest that
> >      >      >      the problem could be bypassed at the MCF job or output
> >      connector
> >      >      level,
> >      >      >      by means of a specific flag telling the MCF
> webcrawler to
> >      skip
> >      >      "non ok
> >      >      >      status: 500, message: Internal Server Error” and keep
> on
> >      >      crawling.
> >      >      >
> >      >      >      Dear Karl, can you insert this possibility in the
> next MCF
> >      >      release?
> >      >      >      Thanks a lot, as ever.
> >      >      >
> >      >      >      Luca
> >      >      >
> >      >      >      -----Messaggio originale-----
> >      >      >      Da: Shinichiro Abe
> >      [mailto:[2][3][5]shinichiro.abe.1@gmail.com]
> >      >      >      Inviato: martedì 7 ottobre 2014 03:21
> >      >      >      A: [3][4][6]user@manifoldcf.apache.org
> >      >      >      Cc: [4][5][7]user@manifoldcf.apache.org
> >      >      >      Oggetto: Re: Internal server error (500) causing a
> crawl
> >      >      interruption
> >      >      >      Hi Luca,
> >      >      >
> >      >      >      Please try to configure ignoreTikaException=true.
> >      >      >
> >      >      >        <requestHandler name="/update/extract"
> >      >      >
> >      >      >
> >      >
> >      class="org.apache.solr.handler.extraction.ExtractingRequestHandler"
> >      >      >      startup="lazy">
> >      >      >          <lst name="defaults">
> >      >      >            <str name="fmap.content">text</str>
> >      >      >            <str name="lowernames">true</str>
> >      >      >            <bool name="ignoreTikaException">true</bool>
> >      >      >            <str name="uprefix">ignored_</str>
> >      >      >            <str name="captureAttr">true</str>
> >      >      >          </lst>
> >      >      >        </requestHandler>
> >      >      >
> >      >      >      Regards,
> >      >      >      Shinichiro Abe
> >      >      >
> >      >      >      On 2014/10/06, at 20:15, Karl Wright
> >      <[5][6][8]daddywri@gmail.com>
> >      >      wrote:
> >      >      >
> >      >      >      > Hi Luca,
> >      >      >      >
> >      >      >      > There is a solr setting which configures Solr Cell
> to
> >      ignore
> >      >      tika
> >      >      >      errors.  I don't remember what it is offhand, but you
> will
> >      want
> >      >      to set
> >      >      >      it properly to disable tika errors.
> >      >      >      >
> >      >      >      > Thanks,
> >      >      >      > Karl
> >      >      >      >
> >      >      >      >
> >      >      >      > On Mon, Oct 6, 2014 at 7:08 AM, Basso Luca
> >      >      >      <[6][7][9]LBasso@regione.emilia-romagna.it> wrote:
> >      >      >      > Hi Karl,
> >      >      >      >
> >      >      >      > we’re using the Web repository connector in
> conjunction
> >      with
> >      >      the Solr
> >      >      >      output connector to crawl a number of web portals (MCF
> >      vers.
> >      >      1.6.1).
> >      >      >      Unfortunately the crawl job often stops giving the
> >      following
> >      >      error:
> >      >      >      >
> >      >      >      > “Repeated service interruptions – failure processing
> >      documents:
> >      >      Server
> >      >      >      at [7][8][10]http://vm97lnx:9474/solr/rerweb5
> returned non
> >      ok status:
> >      >      500,
> >      >      >      message: Internal Server Error”.
> >      >      >      >
> >      >      >      > From the MCF and SOLR logs (which we report
> hereafter)
> >      it seems
> >      >      that
> >      >      >      the problem is arising from Tika and apply to various
> >      types of
> >      >      documents
> >      >      >      (.rtf, .pdf, etc.).
> >      >      >      >
> >      >      >      > How can we fix it?
> >      >      >      >
> >      >      >      > Thank you.
> >      >      >      >
> >      >      >      >
> >      >      >      >
> >      >      >      > Best regards,
> >      >      >      >
> >      >      >      > Luca
> >      >      >      >
> >      >      >      >
> >      >      >      >
> >      >      >      > MCF log:
> >      >      >      >
> >      >      >      >
> >      >      >      >
> >      >      >      > WARN 2014-10-03 17:00:53,982 (Worker thread '37') -
> Solr
> >      >      exception
> >      >      >      during indexing
> >      >      >
> >      >
> >      [8][9][11]
> http://www.regione.emilia-romagna.it/entra-in-regione/polo-archivistico-regionale/archivio-storico/per-approfondire/BolognaArchivioTerritoriale.rtf/at_download/file/BolognaArchivioTerritoriale.rtf
> >      >      >      (500): Server at
> >      [9][10][12]http://vm97lnx:9474/solr/rerweb5 returned
> >      >      non ok
> >      >      >      status:500, message:Internal Server Error
> >      >      >      >
> >      >      >      > org.apache.solr.common.SolrException: Server at
> >      >      >      [10][11][13]http://vm97lnx:9474/solr/rerweb5
> returned non
> >      ok
> >      >      status:500,
> >      >      >      message:Internal Server Error
> >      >      >      >
> >      >      >      > WARN 2014-10-03 17:00:53,985 (Worker thread '37') -
> >      Service
> >      >      >      interruption reported for job 1412340881687 connection
> >      >      'Webcrawler':
> >      >      >      Solr exception during indexing
> >      >      >
> >      >
> >      [11][12][14]
> http://www.regione.emilia-romagna.it/entra-in-regione/polo-archivistico-regionale/archivio-storico/per-approfondire/BolognaArchivioTerritoriale.rtf/at_download/file/BolognaArchivioTerritoriale.rtf
> >      >      >      (500): Server at
> >      [12][13][15]http://vm97lnx:9474/solr/rerweb5
> >      >      returned non ok
> >      >      >      status:500, message:Internal Server Error
> >      >      >      >
> >      >      >      > ERROR 2014-10-03 17:00:53,998 (Worker thread '37') -
> >      Exception
> >      >      tossed:
> >      >      >      Repeated service interruptions - failure processing
> >      document:
> >      >      Server at
> >      >      >      [13][14][16]http://vm97lnx:9474/solr/rerweb5
> returned non
> >      ok
> >      >      status:500,
> >      >      >      message:Internal Server Error
> >      >      >      >
> >      >      >      >
> >      org.apache.manifoldcf.core.interfaces.ManifoldCFException:
> >      >      Repeated
> >      >      >      service interruptions - failure processing document:
> >      Server at
> >      >      >      [14][15][17]http://vm97lnx:9474/solr/rerweb5
> returned non
> >      ok
> >      >      status:500,
> >      >      >      message:Internal Server Error
> >      >      >      >
> >      >      >      > Caused by: org.apache.solr.common.SolrException:
> Server
> >      at
> >      >      >      [15][16][18]http://vm97lnx:9474/solr/rerweb5
> returned non
> >      ok
> >      >      status:500,
> >      >      >      message:Internal Server Error
> >      >      >      >
> >      >      >      >
> >      >      >      >
> >      >      >      > WARN 2014-10-03 18:05:22,636 (Worker thread '0') -
> Solr
> >      >      exception
> >      >      >      during indexing
> >      >      >
> >      >
> >      [16][17][19]
> http://territorio.regione.emilia-romagna.it/codice-territorio/semplificazione-edilizia/non-rue/dm_9_5_2001.pdf/at_download/file/dm_9_5_2001.pdf
> >      >      >      (500): Server at
> >      [17][18][20]http://vm97lnx:9474/solr/rerweb5
> >      >      returned non ok
> >      >      >      status:500, message:Internal Server Error
> >      >      >      >
> >      >      >      > org.apache.solr.common.SolrException: Server at
> >      >      >      [18][19][21]http://vm97lnx:9474/solr/rerweb5
> returned non
> >      ok
> >      >      status:500,
> >      >      >      message:Internal Server Error
> >      >      >      >
> >      >      >      > WARN 2014-10-03 18:05:22,638 (Worker thread '0') -
> >      Service
> >      >      >      interruption reported for job 1412252016695 connection
> >      >      'Webcrawler':
> >      >      >      Solr exception during indexing
> >      >      >
> >      >
> >      [19][20][22]
> http://territorio.regione.emilia-romagna.it/codice-territorio/semplificazione-edilizia/non-rue/dm_9_5_2001.pdf/at_download/file/dm_9_5_2001.pdf
> >      >      >      (500): Server at
> >      [20][21][23]http://vm97lnx:9474/solr/rerweb5
> >      >      returned non ok
> >      >      >      status:500, message:Internal Server Error
> >      >      >      >
> >      >      >      > ERROR 2014-10-03 18:05:22,649 (Worker thread '0') -
> >      Exception
> >      >      tossed:
> >      >      >      Repeated service interruptions - failure processing
> >      document:
> >      >      Server at
> >      >      >      [21][22][24]http://vm97lnx:9474/solr/rerweb5
> returned non
> >      ok
> >      >      status:500,
> >      >      >      message:Internal Server Error
> >      >      >      >
> >      >      >      >
> >      org.apache.manifoldcf.core.interfaces.ManifoldCFException:
> >      >      Repeated
> >      >      >      service interruptions - failure processing document:
> >      Server at
> >      >      >      [22][23][25]http://vm97lnx:9474/solr/rerweb5
> returned non
> >      ok
> >      >      status:500,
> >      >      >      message:Internal Server Error
> >      >      >      >
> >      >      >      > Caused by: org.apache.solr.common.SolrException:
> Server
> >      at
> >      >      >      [23][24][26]http://vm97lnx:9474/solr/rerweb5
> returned non
> >      ok
> >      >      status:500,
> >      >      >      message:Internal Server Error
> >      >      >      >
> >      >      >      >
> >      >      >      >
> >      >      >      > SOLR log:
> >      >      >      >
> >      >      >      >
> >      >      >      >
> >      >      >      > 8:05:10,908 ERROR
> >      [org.apache.solr.servlet.SolrDispatchFilter]
> >      >      >      (http-/10.10.80.97:9474-2)
> >      >      null:org.apache.solr.common.SolrException:
> >      >      >      org.apache.tika.exception.TikaException: TIKA-198:
> Illegal
> >      >      IOException
> >      >      >      from org.apache.tika.parser.pdf.PDFParser@6533a82a
> >      >      >      >
> >      >      >      >        at
> >      >      >
> >      >
> >
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:225)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241)
> >      >      >      >
> >      >      >      >         at
> >      >      org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:768)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:415)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:205)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:280)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:248)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:275)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:161)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.jboss.as.web.security.SecurityContextAssociationValve.invoke(SecurityContextAssociationValve.java:165)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:155)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:372)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:877)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:679)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:931)
> >      >      >      >
> >      >      >      >         at java.lang.Thread.run(Thread.java:745)
> >      >      >      >
> >      >      >      > Caused by: org.apache.tika.exception.TikaException:
> >      TIKA-198:
> >      >      Illegal
> >      >      >      IOException from
> >      org.apache.tika.parser.pdf.PDFParser@6533a82a
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:248)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
> >      >      >      >
> >      >      >      >         ... 20 more
> >      >      >      >
> >      >      >      > Caused by:
> >      org.apache.pdfbox.exceptions.WrappedIOException
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:244)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1206)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1171)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:124)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> >      >      >      >
> >      >      >      >         ... 23 more
> >      >      >      >
> >      >      >      > Caused by:
> java.lang.StringIndexOutOfBoundsException:
> >      String
> >      >      index out
> >      >      >      of range: 2047
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> java.lang.AbstractStringBuilder.deleteCharAt(AbstractStringBuilder.java:762)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      java.lang.StringBuilder.deleteCharAt(StringBuilder.java:258)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSHexString(BaseParser.java:1000)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSString(BaseParser.java:808)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:1241)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:558)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:188)
> >      >      >      >
> >      >      >      >         ... 27 more
> >      >      >      >
> >      >      >      >
> >      >      >      >
> >      >      >      > 17:00:42,273 ERROR
> >      [org.apache.solr.servlet.SolrDispatchFilter]
> >      >      >      (http-/10.10.80.97:9474-2)
> >      >      null:org.apache.solr.common.SolrException:
> >      >      >      org.apache.tika.exception.TikaException: Unexpected
> >      >      RuntimeException
> >      >      >      from org.apache.tika.parser.rtf.RTFParser@73361285
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:225)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241)
> >      >      >      >
> >      >      >      >         at
> >      >      org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:768)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:415)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:205)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:280)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:248)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:275)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:161)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.jboss.as.web.security.SecurityContextAssociationValve.invoke(SecurityContextAssociationValve.java:165)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:155)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:372)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:877)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:679)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:931)
> >      >      >      >
> >      >      >      >         at java.lang.Thread.run(Thread.java:745)
> >      >      >      >
> >      >      >      > Caused by: org.apache.tika.exception.TikaException:
> >      Unexpected
> >      >      >      RuntimeException from
> >      >      org.apache.tika.parser.rtf.RTFParser@73361285
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
> >      >      >      >
> >      >      >      >         ... 20 more
> >      >      >      >
> >      >      >      > Caused by:
> java.lang.ArrayIndexOutOfBoundsException: 9
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.tika.parser.rtf.TextExtractor.processControlWord(TextExtractor.java:872)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.tika.parser.rtf.TextExtractor.parseControlWord(TextExtractor.java:566)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.tika.parser.rtf.TextExtractor.parseControlToken(TextExtractor.java:492)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:459)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:448)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      org.apache.tika.parser.rtf.RTFParser.parse(RTFParser.java:56)
> >      >      >      >
> >      >      >      >         at
> >      >      >
> >      >
> >
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> >      >      >      >
> >      >      >      >         ... 23 more
> >      >      >      >
> >      >      >      >
> >      >      >
> >      >      > References
> >      >      >
> >      >      >    Visible links
> >      >      >    1. mailto:[25][27]LBasso@regione.emilia-romagna.it
> >      >      >    2. mailto:[26][28]shinichiro.abe.1@gmail.com
> >      >      >    3. mailto:[27][29]user@manifoldcf.apache.org
> >      >      >    4. mailto:[28][30]user@manifoldcf.apache.org
> >      >      >    5. mailto:[29][31]daddywri@gmail.com
> >      >      >    6. mailto:[30][32]LBasso@regione.emilia-romagna.it
> >      >      >    7. [31][33]http://vm97lnx:9474/solr/rerweb5
> >      >      >    8.
> >      >
> >      [32][34]
> http://www.regione.emilia-romagna.it/entra-in-regione/polo-archivistico-regionale/archivio-storico/per-approfondire/BolognaArchivioTerritoriale.rtf/at_download/file/BolognaArchivioTerritoriale.rtf
> >      >      >    9. [33][35]http://vm97lnx:9474/solr/rerweb5
> >      >      >   10. [34][36]http://vm97lnx:9474/solr/rerweb5
> >      >      >   11.
> >      >
> >      [35][37]
> http://www.regione.emilia-romagna.it/entra-in-regione/polo-archivistico-regionale/archivio-storico/per-approfondire/BolognaArchivioTerritoriale.rtf/at_download/file/BolognaArchivioTerritoriale.rtf
> >      >      >   12. [36][38]http://vm97lnx:9474/solr/rerweb5
> >      >      >   13. [37][39]http://vm97lnx:9474/solr/rerweb5
> >      >      >   14. [38][40]http://vm97lnx:9474/solr/rerweb5
> >      >      >   15. [39][41]http://vm97lnx:9474/solr/rerweb5
> >      >      >   16.
> >      >
> >      [40][42]
> http://territorio.regione.emilia-romagna.it/codice-territorio/semplificazione-edilizia/non-rue/dm_9_5_2001.pdf/at_download/file/dm_9_5_2001.pdf
> >      >      >   17. [41][43]http://vm97lnx:9474/solr/rerweb5
> >      >      >   18. [42][44]http://vm97lnx:9474/solr/rerweb5
> >      >      >   19.
> >      >
> >      [43][45]
> http://territorio.regione.emilia-romagna.it/codice-territorio/semplificazione-edilizia/non-rue/dm_9_5_2001.pdf/at_download/file/dm_9_5_2001.pdf
> >      >      >   20. [44][46]http://vm97lnx:9474/solr/rerweb5
> >      >      >   21. [45][47]http://vm97lnx:9474/solr/rerweb5
> >      >      >   22. [46][48]http://vm97lnx:9474/solr/rerweb5
> >      >      >   23. [47][49]http://vm97lnx:9474/solr/rerweb5
> >      >
> >      > References
> >      >
> >      >    Visible links
> >      >    1. mailto:[50]kamil.zyta@pwr.edu.pl
> >      >    2. mailto:[51]LBasso@regione.emilia-romagna.it
> >      >    3. mailto:[52]shinichiro.abe.1@gmail.com
> >      >    4. mailto:[53]user@manifoldcf.apache.org
> >      >    5. mailto:[54]user@manifoldcf.apache.org
> >      >    6. mailto:[55]daddywri@gmail.com
> >      >    7. mailto:[56]LBasso@regione.emilia-romagna.it
> >      >    8. [57]http://vm97lnx:9474/solr/rerweb5
> >      >    9.
> >      [58]
> http://www.regione.emilia-romagna.it/entra-in-regione/polo-archivistico-regionale/archivio-storico/per-approfondire/BolognaArchivioTerritoriale.rtf/at_download/file/BolognaArchivioTerritoriale.rtf
> >      >   10. [59]http://vm97lnx:9474/solr/rerweb5
> >      >   11. [60]http://vm97lnx:9474/solr/rerweb5
> >      >   12.
> >      [61]
> http://www.regione.emilia-romagna.it/entra-in-regione/polo-archivistico-regionale/archivio-storico/per-approfondire/BolognaArchivioTerritoriale.rtf/at_download/file/BolognaArchivioTerritoriale.rtf
> >      >   13. [62]http://vm97lnx:9474/solr/rerweb5
> >      >   14. [63]http://vm97lnx:9474/solr/rerweb5
> >      >   15. [64]http://vm97lnx:9474/solr/rerweb5
> >      >   16. [65]http://vm97lnx:9474/solr/rerweb5
> >      >   17.
> >      [66]
> http://territorio.regione.emilia-romagna.it/codice-territorio/semplificazione-edilizia/non-rue/dm_9_5_2001.pdf/at_download/file/dm_9_5_2001.pdf
> >      >   18. [67]http://vm97lnx:9474/solr/rerweb5
> >      >   19. [68]http://vm97lnx:9474/solr/rerweb5
> >      >   20.
> >      [69]
> http://territorio.regione.emilia-romagna.it/codice-territorio/semplificazione-edilizia/non-rue/dm_9_5_2001.pdf/at_download/file/dm_9_5_2001.pdf
> >      >   21. [70]http://vm97lnx:9474/solr/rerweb5
> >      >   22. [71]http://vm97lnx:9474/solr/rerweb5
> >      >   23. [72]http://vm97lnx:9474/solr/rerweb5
> >      >   24. [73]http://vm97lnx:9474/solr/rerweb5
> >      >   25. mailto:[74]LBasso@regione.emilia-romagna.it
> >      >   26. mailto:[75]shinichiro.abe.1@gmail.com
> >      >   27. mailto:[76]user@manifoldcf.apache.org
> >      >   28. mailto:[77]user@manifoldcf.apache.org
> >      >   29. mailto:[78]daddywri@gmail.com
> >      >   30. mailto:[79]LBasso@regione.emilia-romagna.it
> >      >   31. [80]http://vm97lnx:9474/solr/rerweb5
> >      >   32.
> >      [81]
> http://www.regione.emilia-romagna.it/entra-in-regione/polo-archivistico-regionale/archivio-storico/per-approfondire/BolognaArchivioTerritoriale.rtf/at_download/file/BolognaArchivioTerritoriale.rtf
> >      >   33. [82]http://vm97lnx:9474/solr/rerweb5
> >      >   34. [83]http://vm97lnx:9474/solr/rerweb5
> >      >   35.
> >      [84]
> http://www.regione.emilia-romagna.it/entra-in-regione/polo-archivistico-regionale/archivio-storico/per-approfondire/BolognaArchivioTerritoriale.rtf/at_download/file/BolognaArchivioTerritoriale.rtf
> >      >   36. [85]http://vm97lnx:9474/solr/rerweb5
> >      >   37. [86]http://vm97lnx:9474/solr/rerweb5
> >      >   38. [87]http://vm97lnx:9474/solr/rerweb5
> >      >   39. [88]http://vm97lnx:9474/solr/rerweb5
> >      >   40.
> >      [89]
> http://territorio.regione.emilia-romagna.it/codice-territorio/semplificazione-edilizia/non-rue/dm_9_5_2001.pdf/at_download/file/dm_9_5_2001.pdf
> >      >   41. [90]http://vm97lnx:9474/solr/rerweb5
> >      >   42. [91]http://vm97lnx:9474/solr/rerweb5
> >      >   43.
> >      [92]
> http://territorio.regione.emilia-romagna.it/codice-territorio/semplificazione-edilizia/non-rue/dm_9_5_2001.pdf/at_download/file/dm_9_5_2001.pdf
> >      >   44. [93]http://vm97lnx:9474/solr/rerweb5
> >      >   45. [94]http://vm97lnx:9474/solr/rerweb5
> >      >   46. [95]http://vm97lnx:9474/solr/rerweb5
> >      >   47. [96]http://vm97lnx:9474/solr/rerweb5
> >
> > References
> >
> >    Visible links
> >    1. mailto:kamil.zyta@pwr.edu.pl
> >    2. http://pastebin.com/AWkgVeUh
> >    3. mailto:kamil.zyta@pwr.edu.pl
> >    4. mailto:LBasso@regione.emilia-romagna.it
> >    5. mailto:shinichiro.abe.1@gmail.com
> >    6. mailto:user@manifoldcf.apache.org
> >    7. mailto:user@manifoldcf.apache.org
> >    8. mailto:daddywri@gmail.com
> >    9. mailto:LBasso@regione.emilia-romagna.it
> >   10. http://vm97lnx:9474/solr/rerweb5
> >   11.
> http://www.regione.emilia-romagna.it/entra-in-regione/polo-archivistico-regionale/archivio-storico/per-approfondire/BolognaArchivioTerritoriale.rtf/at_download/file/BolognaArchivioTerritoriale.rtf
> >   12. http://vm97lnx:9474/solr/rerweb5
> >   13. http://vm97lnx:9474/solr/rerweb5
> >   14.
> http://www.regione.emilia-romagna.it/entra-in-regione/polo-archivistico-regionale/archivio-storico/per-approfondire/BolognaArchivioTerritoriale.rtf/at_download/file/BolognaArchivioTerritoriale.rtf
> >   15. http://vm97lnx:9474/solr/rerweb5
> >   16. http://vm97lnx:9474/solr/rerweb5
> >   17. http://vm97lnx:9474/solr/rerweb5
> >   18. http://vm97lnx:9474/solr/rerweb5
> >   19.
> http://territorio.regione.emilia-romagna.it/codice-territorio/semplificazione-edilizia/non-rue/dm_9_5_2001.pdf/at_download/file/dm_9_5_2001.pdf
> >   20. http://vm97lnx:9474/solr/rerweb5
> >   21. http://vm97lnx:9474/solr/rerweb5
> >   22.
> http://territorio.regione.emilia-romagna.it/codice-territorio/semplificazione-edilizia/non-rue/dm_9_5_2001.pdf/at_download/file/dm_9_5_2001.pdf
> >   23. http://vm97lnx:9474/solr/rerweb5
> >   24. http://vm97lnx:9474/solr/rerweb5
> >   25. http://vm97lnx:9474/solr/rerweb5
> >   26. http://vm97lnx:9474/solr/rerweb5
> >   27. mailto:LBasso@regione.emilia-romagna.it
> >   28. mailto:shinichiro.abe.1@gmail.com
> >   29. mailto:user@manifoldcf.apache.org
> >   30. mailto:user@manifoldcf.apache.org
> >   31. mailto:daddywri@gmail.com
> >   32. mailto:LBasso@regione.emilia-romagna.it
> >   33. http://vm97lnx:9474/solr/rerweb5
> >   34.
> http://www.regione.emilia-romagna.it/entra-in-regione/polo-archivistico-regionale/archivio-storico/per-approfondire/BolognaArchivioTerritoriale.rtf/at_download/file/BolognaArchivioTerritoriale.rtf
> >   35. http://vm97lnx:9474/solr/rerweb5
> >   36. http://vm97lnx:9474/solr/rerweb5
> >   37.
> http://www.regione.emilia-romagna.it/entra-in-regione/polo-archivistico-regionale/archivio-storico/per-approfondire/BolognaArchivioTerritoriale.rtf/at_download/file/BolognaArchivioTerritoriale.rtf
> >   38. http://vm97lnx:9474/solr/rerweb5
> >   39. http://vm97lnx:9474/solr/rerweb5
> >   40. http://vm97lnx:9474/solr/rerweb5
> >   41. http://vm97lnx:9474/solr/rerweb5
> >   42.
> http://territorio.regione.emilia-romagna.it/codice-territorio/semplificazione-edilizia/non-rue/dm_9_5_2001.pdf/at_download/file/dm_9_5_2001.pdf
> >   43. http://vm97lnx:9474/solr/rerweb5
> >   44. http://vm97lnx:9474/solr/rerweb5
> >   45.
> http://territorio.regione.emilia-romagna.it/codice-territorio/semplificazione-edilizia/non-rue/dm_9_5_2001.pdf/at_download/file/dm_9_5_2001.pdf
> >   46. http://vm97lnx:9474/solr/rerweb5
> >   47. http://vm97lnx:9474/solr/rerweb5
> >   48. http://vm97lnx:9474/solr/rerweb5
> >   49. http://vm97lnx:9474/solr/rerweb5
> >   50. mailto:kamil.zyta@pwr.edu.pl
> >   51. mailto:LBasso@regione.emilia-romagna.it
> >   52. mailto:shinichiro.abe.1@gmail.com
> >   53. mailto:user@manifoldcf.apache.org
> >   54. mailto:user@manifoldcf.apache.org
> >   55. mailto:daddywri@gmail.com
> >   56. mailto:LBasso@regione.emilia-romagna.it
> >   57. http://vm97lnx:9474/solr/rerweb5
> >   58.
> http://www.regione.emilia-romagna.it/entra-in-regione/polo-archivistico-regionale/archivio-storico/per-approfondire/BolognaArchivioTerritoriale.rtf/at_download/file/BolognaArchivioTerritoriale.rtf
> >   59. http://vm97lnx:9474/solr/rerweb5
> >   60. http://vm97lnx:9474/solr/rerweb5
> >   61.
> http://www.regione.emilia-romagna.it/entra-in-regione/polo-archivistico-regionale/archivio-storico/per-approfondire/BolognaArchivioTerritoriale.rtf/at_download/file/BolognaArchivioTerritoriale.rtf
> >   62. http://vm97lnx:9474/solr/rerweb5
> >   63. http://vm97lnx:9474/solr/rerweb5
> >   64. http://vm97lnx:9474/solr/rerweb5
> >   65. http://vm97lnx:9474/solr/rerweb5
> >   66.
> http://territorio.regione.emilia-romagna.it/codice-territorio/semplificazione-edilizia/non-rue/dm_9_5_2001.pdf/at_download/file/dm_9_5_2001.pdf
> >   67. http://vm97lnx:9474/solr/rerweb5
> >   68. http://vm97lnx:9474/solr/rerweb5
> >   69.
> http://territorio.regione.emilia-romagna.it/codice-territorio/semplificazione-edilizia/non-rue/dm_9_5_2001.pdf/at_download/file/dm_9_5_2001.pdf
> >   70. http://vm97lnx:9474/solr/rerweb5
> >   71. http://vm97lnx:9474/solr/rerweb5
> >   72. http://vm97lnx:9474/solr/rerweb5
> >   73. http://vm97lnx:9474/solr/rerweb5
> >   74. mailto:LBasso@regione.emilia-romagna.it
> >   75. mailto:shinichiro.abe.1@gmail.com
> >   76. mailto:user@manifoldcf.apache.org
> >   77. mailto:user@manifoldcf.apache.org
> >   78. mailto:daddywri@gmail.com
> >   79. mailto:LBasso@regione.emilia-romagna.it
> >   80. http://vm97lnx:9474/solr/rerweb5
> >   81.
> http://www.regione.emilia-romagna.it/entra-in-regione/polo-archivistico-regionale/archivio-storico/per-approfondire/BolognaArchivioTerritoriale.rtf/at_download/file/BolognaArchivioTerritoriale.rtf
> >   82. http://vm97lnx:9474/solr/rerweb5
> >   83. http://vm97lnx:9474/solr/rerweb5
> >   84.
> http://www.regione.emilia-romagna.it/entra-in-regione/polo-archivistico-regionale/archivio-storico/per-approfondire/BolognaArchivioTerritoriale.rtf/at_download/file/BolognaArchivioTerritoriale.rtf
> >   85. http://vm97lnx:9474/solr/rerweb5
> >   86. http://vm97lnx:9474/solr/rerweb5
> >   87. http://vm97lnx:9474/solr/rerweb5
> >   88. http://vm97lnx:9474/solr/rerweb5
> >   89.
> http://territorio.regione.emilia-romagna.it/codice-territorio/semplificazione-edilizia/non-rue/dm_9_5_2001.pdf/at_download/file/dm_9_5_2001.pdf
> >   90. http://vm97lnx:9474/solr/rerweb5
> >   91. http://vm97lnx:9474/solr/rerweb5
> >   92.
> http://territorio.regione.emilia-romagna.it/codice-territorio/semplificazione-edilizia/non-rue/dm_9_5_2001.pdf/at_download/file/dm_9_5_2001.pdf
> >   93. http://vm97lnx:9474/solr/rerweb5
> >   94. http://vm97lnx:9474/solr/rerweb5
> >   95. http://vm97lnx:9474/solr/rerweb5
> >   96. http://vm97lnx:9474/solr/rerweb5
>

Mime
View raw message