manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kamil Żyta <kamil.z...@pwr.edu.pl>
Subject Re: Internal server error (500) causing a crawl interruption
Date Mon, 20 Oct 2014 16:45:57 GMT
Documents size isn't a solution. It happens for 200MB file, larger file
Solr extract well. Solr blame Tika, Tika has bugs:
https://issues.apache.org/jira/browse/TIKA-1388
Solr and ManifoldCF use Tika 1.5, new versions will use Tika 1.6.
Each versions of Tika will have errors. In my opinion ManifoldCF should
handle Solr errors so that the task is ended at a decent time.

Regards,
KŻ


On Mon, Oct 20, 2014 at 12:27:16PM -0400, Karl Wright wrote:
>    Well, that's clear enough:
>    ERROR - 2014-10-20 10:54:00.355; org.apache.solr.common.SolrException;
>    null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Requested
>    array size exceeds VM limit
>    OutOfMemoryExceptions are never fully survivable, because in a
>    multithreaded environment, other threads may suffer too from a memory
>    restriction of this kind.
>    The solution: Either address the problem on the Solr side, or you can
>    limit the maximum size of documents being indexed by using a Document
>    Filter transformer.  Another solution is, as you suggest, using the Tika
>    Extractor in ManifoldCF, which is set up to stream data through Tika
>    rather than put it all in memory.  But you will still need a maximum
>    document size limit even then, because when you aren't using the SolrCell
>    (ExtractingUpdateHandler) approach for Solr, SolrJ loads the entire
>    document into memory on the ManifoldCF side.  So you are probably better
>    off just establishing the limit.
>    Thanks,
>    Karl
>    On Mon, Oct 20, 2014 at 12:19 PM, Kamil Żyta <[1]kamil.zyta@pwr.edu.pl>
>    wrote:
> 
>      [2]http://pastebin.com/AWkgVeUh
> 
>      K
>      On Mon, Oct 20, 2014 at 12:13:42PM -0400, Karl Wright wrote:
>      >    Can you provide the solr exception, from the solr log?
>      >    Karl
>      >    On Mon, Oct 20, 2014 at 12:11 PM, Kamil Żyta
>      <[1][3]kamil.zyta@pwr.edu.pl>
>      >    wrote:
>      >
>      >      Hi,
>      >      I have some bad files too and get 500 errors from Solr, tested on
>      >      Solr stable and trunk (Tika 1.5, 1.6). ManifoldCF job hang and
>      never
>      >      end.
>      >      ManifoldCF have 'Transformation Connections' where I added Tika
>      >      extractor.
>      >      How this works? It's only metadata extraction or mime detection?
>      >      If manifoldCF had complete Tika extraction it would had better
>      handle
>      >      Tika
>      >      errors.
>      >
>      >      Regards,
>      >      KŻ
>      >      On Mon, Oct 20, 2014 at 06:15:52AM -0400, Karl Wright wrote:
>      >      >    Hi Luca,
>      >      >    I am sorry, but we only get back a 500 error from Solr, and
>      that is
>      >      not
>      >      >    enough information to determine that Tika failed.  Having a
>      general
>      >      policy
>      >      >    of ignoring 500 errors, which occur when *any* solr
>      exception is
>      >      thrown,
>      >      >    seems like a bad idea to me.  Indeed, I am concerned that it
>      is not
>      >      a Tika
>      >      >    failure that you are seeing, but rather something like Solr
>      running
>      >      out of
>      >      >    memory, which should definitely never be ignored.
>      >      >    You can tell by looking at the actual exception Solr logs to
>      >      determine
>      >      >    what the underlying cause is.
>      >      >    Thanks,
>      >      >    Karl
>      >      >    On Mon, Oct 20, 2014 at 5:00 AM, Basso Luca
>      >      >    <[1][2][4]LBasso@regione.emilia-romagna.it> wrote:
>      >      >
>      >      >      Hi Shinichiro,
>      >      >      we found the right configuration just before your
>      suggestion.
>      >      >      Thank you!
>      >      >
>      >      >      Nevertheless, applying "ignoreTikaException" reduces
>      somewhat the
>      >      >      problem but doesn't resolve it completely.
>      >      >      Specifically, the problem still persist for some pdf files
>      (not
>      >      only for
>      >      >      scanned pdf and/or pdf converted from ms-office
>      documents).
>      >      >      Given that the Tika project is not resolving this issue,
>      we
>      >      suggest that
>      >      >      the problem could be bypassed at the MCF job or output
>      connector
>      >      level,
>      >      >      by means of a specific flag telling the MCF webcrawler to
>      skip
>      >      "non ok
>      >      >      status: 500, message: Internal Server Error” and keep on
>      >      crawling.
>      >      >
>      >      >      Dear Karl, can you insert this possibility in the next MCF
>      >      release?
>      >      >      Thanks a lot, as ever.
>      >      >
>      >      >      Luca
>      >      >
>      >      >      -----Messaggio originale-----
>      >      >      Da: Shinichiro Abe
>      [mailto:[2][3][5]shinichiro.abe.1@gmail.com]
>      >      >      Inviato: martedì 7 ottobre 2014 03:21
>      >      >      A: [3][4][6]user@manifoldcf.apache.org
>      >      >      Cc: [4][5][7]user@manifoldcf.apache.org
>      >      >      Oggetto: Re: Internal server error (500) causing a crawl
>      >      interruption
>      >      >      Hi Luca,
>      >      >
>      >      >      Please try to configure ignoreTikaException=true.
>      >      >
>      >      >        <requestHandler name="/update/extract"
>      >      >                       
>      >      >     
>      >     
>      class="org.apache.solr.handler.extraction.ExtractingRequestHandler"
>      >      >      startup="lazy">
>      >      >          <lst name="defaults">
>      >      >            <str name="fmap.content">text</str>
>      >      >            <str name="lowernames">true</str>
>      >      >            <bool name="ignoreTikaException">true</bool>
>      >      >            <str name="uprefix">ignored_</str>
>      >      >            <str name="captureAttr">true</str>
>      >      >          </lst>
>      >      >        </requestHandler>
>      >      >
>      >      >      Regards,
>      >      >      Shinichiro Abe
>      >      >
>      >      >      On 2014/10/06, at 20:15, Karl Wright
>      <[5][6][8]daddywri@gmail.com>
>      >      wrote:
>      >      >
>      >      >      > Hi Luca,
>      >      >      >
>      >      >      > There is a solr setting which configures Solr Cell to
>      ignore
>      >      tika
>      >      >      errors.  I don't remember what it is offhand, but you will
>      want
>      >      to set
>      >      >      it properly to disable tika errors.
>      >      >      >
>      >      >      > Thanks,
>      >      >      > Karl
>      >      >      >
>      >      >      >
>      >      >      > On Mon, Oct 6, 2014 at 7:08 AM, Basso Luca
>      >      >      <[6][7][9]LBasso@regione.emilia-romagna.it> wrote:
>      >      >      > Hi Karl,
>      >      >      >
>      >      >      > we’re using the Web repository connector in conjunction
>      with
>      >      the Solr
>      >      >      output connector to crawl a number of web portals (MCF
>      vers.
>      >      1.6.1).
>      >      >      Unfortunately the crawl job often stops giving the
>      following
>      >      error:
>      >      >      >
>      >      >      > “Repeated service interruptions – failure processing
>      documents:
>      >      Server
>      >      >      at [7][8][10]http://vm97lnx:9474/solr/rerweb5 returned non
>      ok status:
>      >      500,
>      >      >      message: Internal Server Error”.
>      >      >      >
>      >      >      > From the MCF and SOLR logs (which we report hereafter)
>      it seems
>      >      that
>      >      >      the problem is arising from Tika and apply to various
>      types of
>      >      documents
>      >      >      (.rtf, .pdf, etc.).
>      >      >      >
>      >      >      > How can we fix it?
>      >      >      >
>      >      >      > Thank you.
>      >      >      >
>      >      >      >
>      >      >      >
>      >      >      > Best regards,
>      >      >      >
>      >      >      > Luca
>      >      >      >
>      >      >      >
>      >      >      >
>      >      >      > MCF log:
>      >      >      >
>      >      >      >
>      >      >      >
>      >      >      > WARN 2014-10-03 17:00:53,982 (Worker thread '37') - Solr
>      >      exception
>      >      >      during indexing
>      >      >     
>      >     
>      [8][9][11]http://www.regione.emilia-romagna.it/entra-in-regione/polo-archivistico-regionale/archivio-storico/per-approfondire/BolognaArchivioTerritoriale.rtf/at_download/file/BolognaArchivioTerritoriale.rtf
>      >      >      (500): Server at
>      [9][10][12]http://vm97lnx:9474/solr/rerweb5 returned
>      >      non ok
>      >      >      status:500, message:Internal Server Error
>      >      >      >
>      >      >      > org.apache.solr.common.SolrException: Server at
>      >      >      [10][11][13]http://vm97lnx:9474/solr/rerweb5 returned non
>      ok
>      >      status:500,
>      >      >      message:Internal Server Error
>      >      >      >
>      >      >      > WARN 2014-10-03 17:00:53,985 (Worker thread '37') -
>      Service
>      >      >      interruption reported for job 1412340881687 connection
>      >      'Webcrawler':
>      >      >      Solr exception during indexing
>      >      >     
>      >     
>      [11][12][14]http://www.regione.emilia-romagna.it/entra-in-regione/polo-archivistico-regionale/archivio-storico/per-approfondire/BolognaArchivioTerritoriale.rtf/at_download/file/BolognaArchivioTerritoriale.rtf
>      >      >      (500): Server at
>      [12][13][15]http://vm97lnx:9474/solr/rerweb5
>      >      returned non ok
>      >      >      status:500, message:Internal Server Error
>      >      >      >
>      >      >      > ERROR 2014-10-03 17:00:53,998 (Worker thread '37') -
>      Exception
>      >      tossed:
>      >      >      Repeated service interruptions - failure processing
>      document:
>      >      Server at
>      >      >      [13][14][16]http://vm97lnx:9474/solr/rerweb5 returned non
>      ok
>      >      status:500,
>      >      >      message:Internal Server Error
>      >      >      >
>      >      >      >
>      org.apache.manifoldcf.core.interfaces.ManifoldCFException:
>      >      Repeated
>      >      >      service interruptions - failure processing document:
>      Server at
>      >      >      [14][15][17]http://vm97lnx:9474/solr/rerweb5 returned non
>      ok
>      >      status:500,
>      >      >      message:Internal Server Error
>      >      >      >
>      >      >      > Caused by: org.apache.solr.common.SolrException: Server
>      at
>      >      >      [15][16][18]http://vm97lnx:9474/solr/rerweb5 returned non
>      ok
>      >      status:500,
>      >      >      message:Internal Server Error
>      >      >      >
>      >      >      >
>      >      >      >
>      >      >      > WARN 2014-10-03 18:05:22,636 (Worker thread '0') - Solr
>      >      exception
>      >      >      during indexing
>      >      >     
>      >     
>      [16][17][19]http://territorio.regione.emilia-romagna.it/codice-territorio/semplificazione-edilizia/non-rue/dm_9_5_2001.pdf/at_download/file/dm_9_5_2001.pdf
>      >      >      (500): Server at
>      [17][18][20]http://vm97lnx:9474/solr/rerweb5
>      >      returned non ok
>      >      >      status:500, message:Internal Server Error
>      >      >      >
>      >      >      > org.apache.solr.common.SolrException: Server at
>      >      >      [18][19][21]http://vm97lnx:9474/solr/rerweb5 returned non
>      ok
>      >      status:500,
>      >      >      message:Internal Server Error
>      >      >      >
>      >      >      > WARN 2014-10-03 18:05:22,638 (Worker thread '0') -
>      Service
>      >      >      interruption reported for job 1412252016695 connection
>      >      'Webcrawler':
>      >      >      Solr exception during indexing
>      >      >     
>      >     
>      [19][20][22]http://territorio.regione.emilia-romagna.it/codice-territorio/semplificazione-edilizia/non-rue/dm_9_5_2001.pdf/at_download/file/dm_9_5_2001.pdf
>      >      >      (500): Server at
>      [20][21][23]http://vm97lnx:9474/solr/rerweb5
>      >      returned non ok
>      >      >      status:500, message:Internal Server Error
>      >      >      >
>      >      >      > ERROR 2014-10-03 18:05:22,649 (Worker thread '0') -
>      Exception
>      >      tossed:
>      >      >      Repeated service interruptions - failure processing
>      document:
>      >      Server at
>      >      >      [21][22][24]http://vm97lnx:9474/solr/rerweb5 returned non
>      ok
>      >      status:500,
>      >      >      message:Internal Server Error
>      >      >      >
>      >      >      >
>      org.apache.manifoldcf.core.interfaces.ManifoldCFException:
>      >      Repeated
>      >      >      service interruptions - failure processing document:
>      Server at
>      >      >      [22][23][25]http://vm97lnx:9474/solr/rerweb5 returned non
>      ok
>      >      status:500,
>      >      >      message:Internal Server Error
>      >      >      >
>      >      >      > Caused by: org.apache.solr.common.SolrException: Server
>      at
>      >      >      [23][24][26]http://vm97lnx:9474/solr/rerweb5 returned non
>      ok
>      >      status:500,
>      >      >      message:Internal Server Error
>      >      >      >
>      >      >      >
>      >      >      >
>      >      >      > SOLR log:
>      >      >      >
>      >      >      >
>      >      >      >
>      >      >      > 8:05:10,908 ERROR
>      [org.apache.solr.servlet.SolrDispatchFilter]
>      >      >      (http-/10.10.80.97:9474-2)
>      >      null:org.apache.solr.common.SolrException:
>      >      >      org.apache.tika.exception.TikaException: TIKA-198: Illegal
>      >      IOException
>      >      >      from org.apache.tika.parser.pdf.PDFParser@6533a82a
>      >      >      >
>      >      >      >        at
>      >      >     
>      >     
>      org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:225)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241)
>      >      >      >
>      >      >      >         at
>      >      org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:768)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:415)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:205)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:280)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:248)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:275)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:161)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.jboss.as.web.security.SecurityContextAssociationValve.invoke(SecurityContextAssociationValve.java:165)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:155)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:372)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:877)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:679)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:931)
>      >      >      >
>      >      >      >         at java.lang.Thread.run(Thread.java:745)
>      >      >      >
>      >      >      > Caused by: org.apache.tika.exception.TikaException:
>      TIKA-198:
>      >      Illegal
>      >      >      IOException from
>      org.apache.tika.parser.pdf.PDFParser@6533a82a
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:248)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
>      >      >      >
>      >      >      >         ... 20 more
>      >      >      >
>      >      >      > Caused by:
>      org.apache.pdfbox.exceptions.WrappedIOException
>      >      >      >
>      >      >      >         at
>      >      >     
>      org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:244)
>      >      >      >
>      >      >      >         at
>      >      >     
>      org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1206)
>      >      >      >
>      >      >      >         at
>      >      >     
>      org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1171)
>      >      >      >
>      >      >      >         at
>      >      >     
>      org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:124)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>      >      >      >
>      >      >      >         ... 23 more
>      >      >      >
>      >      >      > Caused by: java.lang.StringIndexOutOfBoundsException:
>      String
>      >      index out
>      >      >      of range: 2047
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      java.lang.AbstractStringBuilder.deleteCharAt(AbstractStringBuilder.java:762)
>      >      >      >
>      >      >      >         at
>      >      >     
>      java.lang.StringBuilder.deleteCharAt(StringBuilder.java:258)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.pdfbox.pdfparser.BaseParser.parseCOSHexString(BaseParser.java:1000)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.pdfbox.pdfparser.BaseParser.parseCOSString(BaseParser.java:808)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:1241)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:558)
>      >      >      >
>      >      >      >         at
>      >      >     
>      org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:188)
>      >      >      >
>      >      >      >         ... 27 more
>      >      >      >
>      >      >      >
>      >      >      >
>      >      >      > 17:00:42,273 ERROR
>      [org.apache.solr.servlet.SolrDispatchFilter]
>      >      >      (http-/10.10.80.97:9474-2)
>      >      null:org.apache.solr.common.SolrException:
>      >      >      org.apache.tika.exception.TikaException: Unexpected
>      >      RuntimeException
>      >      >      from org.apache.tika.parser.rtf.RTFParser@73361285
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:225)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241)
>      >      >      >
>      >      >      >         at
>      >      org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:768)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:415)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:205)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:280)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:248)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:275)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:161)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.jboss.as.web.security.SecurityContextAssociationValve.invoke(SecurityContextAssociationValve.java:165)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:155)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:372)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:877)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:679)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:931)
>      >      >      >
>      >      >      >         at java.lang.Thread.run(Thread.java:745)
>      >      >      >
>      >      >      > Caused by: org.apache.tika.exception.TikaException:
>      Unexpected
>      >      >      RuntimeException from
>      >      org.apache.tika.parser.rtf.RTFParser@73361285
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
>      >      >      >
>      >      >      >         ... 20 more
>      >      >      >
>      >      >      > Caused by: java.lang.ArrayIndexOutOfBoundsException: 9
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.tika.parser.rtf.TextExtractor.processControlWord(TextExtractor.java:872)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.tika.parser.rtf.TextExtractor.parseControlWord(TextExtractor.java:566)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.tika.parser.rtf.TextExtractor.parseControlToken(TextExtractor.java:492)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:459)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:448)
>      >      >      >
>      >      >      >         at
>      >      >     
>      org.apache.tika.parser.rtf.RTFParser.parse(RTFParser.java:56)
>      >      >      >
>      >      >      >         at
>      >      >     
>      >     
>      org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>      >      >      >
>      >      >      >         ... 23 more
>      >      >      >
>      >      >      >
>      >      >
>      >      > References
>      >      >
>      >      >    Visible links
>      >      >    1. mailto:[25][27]LBasso@regione.emilia-romagna.it
>      >      >    2. mailto:[26][28]shinichiro.abe.1@gmail.com
>      >      >    3. mailto:[27][29]user@manifoldcf.apache.org
>      >      >    4. mailto:[28][30]user@manifoldcf.apache.org
>      >      >    5. mailto:[29][31]daddywri@gmail.com
>      >      >    6. mailto:[30][32]LBasso@regione.emilia-romagna.it
>      >      >    7. [31][33]http://vm97lnx:9474/solr/rerweb5
>      >      >    8.
>      >     
>      [32][34]http://www.regione.emilia-romagna.it/entra-in-regione/polo-archivistico-regionale/archivio-storico/per-approfondire/BolognaArchivioTerritoriale.rtf/at_download/file/BolognaArchivioTerritoriale.rtf
>      >      >    9. [33][35]http://vm97lnx:9474/solr/rerweb5
>      >      >   10. [34][36]http://vm97lnx:9474/solr/rerweb5
>      >      >   11.
>      >     
>      [35][37]http://www.regione.emilia-romagna.it/entra-in-regione/polo-archivistico-regionale/archivio-storico/per-approfondire/BolognaArchivioTerritoriale.rtf/at_download/file/BolognaArchivioTerritoriale.rtf
>      >      >   12. [36][38]http://vm97lnx:9474/solr/rerweb5
>      >      >   13. [37][39]http://vm97lnx:9474/solr/rerweb5
>      >      >   14. [38][40]http://vm97lnx:9474/solr/rerweb5
>      >      >   15. [39][41]http://vm97lnx:9474/solr/rerweb5
>      >      >   16.
>      >     
>      [40][42]http://territorio.regione.emilia-romagna.it/codice-territorio/semplificazione-edilizia/non-rue/dm_9_5_2001.pdf/at_download/file/dm_9_5_2001.pdf
>      >      >   17. [41][43]http://vm97lnx:9474/solr/rerweb5
>      >      >   18. [42][44]http://vm97lnx:9474/solr/rerweb5
>      >      >   19.
>      >     
>      [43][45]http://territorio.regione.emilia-romagna.it/codice-territorio/semplificazione-edilizia/non-rue/dm_9_5_2001.pdf/at_download/file/dm_9_5_2001.pdf
>      >      >   20. [44][46]http://vm97lnx:9474/solr/rerweb5
>      >      >   21. [45][47]http://vm97lnx:9474/solr/rerweb5
>      >      >   22. [46][48]http://vm97lnx:9474/solr/rerweb5
>      >      >   23. [47][49]http://vm97lnx:9474/solr/rerweb5
>      >
>      > References
>      >
>      >    Visible links
>      >    1. mailto:[50]kamil.zyta@pwr.edu.pl
>      >    2. mailto:[51]LBasso@regione.emilia-romagna.it
>      >    3. mailto:[52]shinichiro.abe.1@gmail.com
>      >    4. mailto:[53]user@manifoldcf.apache.org
>      >    5. mailto:[54]user@manifoldcf.apache.org
>      >    6. mailto:[55]daddywri@gmail.com
>      >    7. mailto:[56]LBasso@regione.emilia-romagna.it
>      >    8. [57]http://vm97lnx:9474/solr/rerweb5
>      >    9.
>      [58]http://www.regione.emilia-romagna.it/entra-in-regione/polo-archivistico-regionale/archivio-storico/per-approfondire/BolognaArchivioTerritoriale.rtf/at_download/file/BolognaArchivioTerritoriale.rtf
>      >   10. [59]http://vm97lnx:9474/solr/rerweb5
>      >   11. [60]http://vm97lnx:9474/solr/rerweb5
>      >   12.
>      [61]http://www.regione.emilia-romagna.it/entra-in-regione/polo-archivistico-regionale/archivio-storico/per-approfondire/BolognaArchivioTerritoriale.rtf/at_download/file/BolognaArchivioTerritoriale.rtf
>      >   13. [62]http://vm97lnx:9474/solr/rerweb5
>      >   14. [63]http://vm97lnx:9474/solr/rerweb5
>      >   15. [64]http://vm97lnx:9474/solr/rerweb5
>      >   16. [65]http://vm97lnx:9474/solr/rerweb5
>      >   17.
>      [66]http://territorio.regione.emilia-romagna.it/codice-territorio/semplificazione-edilizia/non-rue/dm_9_5_2001.pdf/at_download/file/dm_9_5_2001.pdf
>      >   18. [67]http://vm97lnx:9474/solr/rerweb5
>      >   19. [68]http://vm97lnx:9474/solr/rerweb5
>      >   20.
>      [69]http://territorio.regione.emilia-romagna.it/codice-territorio/semplificazione-edilizia/non-rue/dm_9_5_2001.pdf/at_download/file/dm_9_5_2001.pdf
>      >   21. [70]http://vm97lnx:9474/solr/rerweb5
>      >   22. [71]http://vm97lnx:9474/solr/rerweb5
>      >   23. [72]http://vm97lnx:9474/solr/rerweb5
>      >   24. [73]http://vm97lnx:9474/solr/rerweb5
>      >   25. mailto:[74]LBasso@regione.emilia-romagna.it
>      >   26. mailto:[75]shinichiro.abe.1@gmail.com
>      >   27. mailto:[76]user@manifoldcf.apache.org
>      >   28. mailto:[77]user@manifoldcf.apache.org
>      >   29. mailto:[78]daddywri@gmail.com
>      >   30. mailto:[79]LBasso@regione.emilia-romagna.it
>      >   31. [80]http://vm97lnx:9474/solr/rerweb5
>      >   32.
>      [81]http://www.regione.emilia-romagna.it/entra-in-regione/polo-archivistico-regionale/archivio-storico/per-approfondire/BolognaArchivioTerritoriale.rtf/at_download/file/BolognaArchivioTerritoriale.rtf
>      >   33. [82]http://vm97lnx:9474/solr/rerweb5
>      >   34. [83]http://vm97lnx:9474/solr/rerweb5
>      >   35.
>      [84]http://www.regione.emilia-romagna.it/entra-in-regione/polo-archivistico-regionale/archivio-storico/per-approfondire/BolognaArchivioTerritoriale.rtf/at_download/file/BolognaArchivioTerritoriale.rtf
>      >   36. [85]http://vm97lnx:9474/solr/rerweb5
>      >   37. [86]http://vm97lnx:9474/solr/rerweb5
>      >   38. [87]http://vm97lnx:9474/solr/rerweb5
>      >   39. [88]http://vm97lnx:9474/solr/rerweb5
>      >   40.
>      [89]http://territorio.regione.emilia-romagna.it/codice-territorio/semplificazione-edilizia/non-rue/dm_9_5_2001.pdf/at_download/file/dm_9_5_2001.pdf
>      >   41. [90]http://vm97lnx:9474/solr/rerweb5
>      >   42. [91]http://vm97lnx:9474/solr/rerweb5
>      >   43.
>      [92]http://territorio.regione.emilia-romagna.it/codice-territorio/semplificazione-edilizia/non-rue/dm_9_5_2001.pdf/at_download/file/dm_9_5_2001.pdf
>      >   44. [93]http://vm97lnx:9474/solr/rerweb5
>      >   45. [94]http://vm97lnx:9474/solr/rerweb5
>      >   46. [95]http://vm97lnx:9474/solr/rerweb5
>      >   47. [96]http://vm97lnx:9474/solr/rerweb5
> 
> References
> 
>    Visible links
>    1. mailto:kamil.zyta@pwr.edu.pl
>    2. http://pastebin.com/AWkgVeUh
>    3. mailto:kamil.zyta@pwr.edu.pl
>    4. mailto:LBasso@regione.emilia-romagna.it
>    5. mailto:shinichiro.abe.1@gmail.com
>    6. mailto:user@manifoldcf.apache.org
>    7. mailto:user@manifoldcf.apache.org
>    8. mailto:daddywri@gmail.com
>    9. mailto:LBasso@regione.emilia-romagna.it
>   10. http://vm97lnx:9474/solr/rerweb5
>   11. http://www.regione.emilia-romagna.it/entra-in-regione/polo-archivistico-regionale/archivio-storico/per-approfondire/BolognaArchivioTerritoriale.rtf/at_download/file/BolognaArchivioTerritoriale.rtf
>   12. http://vm97lnx:9474/solr/rerweb5
>   13. http://vm97lnx:9474/solr/rerweb5
>   14. http://www.regione.emilia-romagna.it/entra-in-regione/polo-archivistico-regionale/archivio-storico/per-approfondire/BolognaArchivioTerritoriale.rtf/at_download/file/BolognaArchivioTerritoriale.rtf
>   15. http://vm97lnx:9474/solr/rerweb5
>   16. http://vm97lnx:9474/solr/rerweb5
>   17. http://vm97lnx:9474/solr/rerweb5
>   18. http://vm97lnx:9474/solr/rerweb5
>   19. http://territorio.regione.emilia-romagna.it/codice-territorio/semplificazione-edilizia/non-rue/dm_9_5_2001.pdf/at_download/file/dm_9_5_2001.pdf
>   20. http://vm97lnx:9474/solr/rerweb5
>   21. http://vm97lnx:9474/solr/rerweb5
>   22. http://territorio.regione.emilia-romagna.it/codice-territorio/semplificazione-edilizia/non-rue/dm_9_5_2001.pdf/at_download/file/dm_9_5_2001.pdf
>   23. http://vm97lnx:9474/solr/rerweb5
>   24. http://vm97lnx:9474/solr/rerweb5
>   25. http://vm97lnx:9474/solr/rerweb5
>   26. http://vm97lnx:9474/solr/rerweb5
>   27. mailto:LBasso@regione.emilia-romagna.it
>   28. mailto:shinichiro.abe.1@gmail.com
>   29. mailto:user@manifoldcf.apache.org
>   30. mailto:user@manifoldcf.apache.org
>   31. mailto:daddywri@gmail.com
>   32. mailto:LBasso@regione.emilia-romagna.it
>   33. http://vm97lnx:9474/solr/rerweb5
>   34. http://www.regione.emilia-romagna.it/entra-in-regione/polo-archivistico-regionale/archivio-storico/per-approfondire/BolognaArchivioTerritoriale.rtf/at_download/file/BolognaArchivioTerritoriale.rtf
>   35. http://vm97lnx:9474/solr/rerweb5
>   36. http://vm97lnx:9474/solr/rerweb5
>   37. http://www.regione.emilia-romagna.it/entra-in-regione/polo-archivistico-regionale/archivio-storico/per-approfondire/BolognaArchivioTerritoriale.rtf/at_download/file/BolognaArchivioTerritoriale.rtf
>   38. http://vm97lnx:9474/solr/rerweb5
>   39. http://vm97lnx:9474/solr/rerweb5
>   40. http://vm97lnx:9474/solr/rerweb5
>   41. http://vm97lnx:9474/solr/rerweb5
>   42. http://territorio.regione.emilia-romagna.it/codice-territorio/semplificazione-edilizia/non-rue/dm_9_5_2001.pdf/at_download/file/dm_9_5_2001.pdf
>   43. http://vm97lnx:9474/solr/rerweb5
>   44. http://vm97lnx:9474/solr/rerweb5
>   45. http://territorio.regione.emilia-romagna.it/codice-territorio/semplificazione-edilizia/non-rue/dm_9_5_2001.pdf/at_download/file/dm_9_5_2001.pdf
>   46. http://vm97lnx:9474/solr/rerweb5
>   47. http://vm97lnx:9474/solr/rerweb5
>   48. http://vm97lnx:9474/solr/rerweb5
>   49. http://vm97lnx:9474/solr/rerweb5
>   50. mailto:kamil.zyta@pwr.edu.pl
>   51. mailto:LBasso@regione.emilia-romagna.it
>   52. mailto:shinichiro.abe.1@gmail.com
>   53. mailto:user@manifoldcf.apache.org
>   54. mailto:user@manifoldcf.apache.org
>   55. mailto:daddywri@gmail.com
>   56. mailto:LBasso@regione.emilia-romagna.it
>   57. http://vm97lnx:9474/solr/rerweb5
>   58. http://www.regione.emilia-romagna.it/entra-in-regione/polo-archivistico-regionale/archivio-storico/per-approfondire/BolognaArchivioTerritoriale.rtf/at_download/file/BolognaArchivioTerritoriale.rtf
>   59. http://vm97lnx:9474/solr/rerweb5
>   60. http://vm97lnx:9474/solr/rerweb5
>   61. http://www.regione.emilia-romagna.it/entra-in-regione/polo-archivistico-regionale/archivio-storico/per-approfondire/BolognaArchivioTerritoriale.rtf/at_download/file/BolognaArchivioTerritoriale.rtf
>   62. http://vm97lnx:9474/solr/rerweb5
>   63. http://vm97lnx:9474/solr/rerweb5
>   64. http://vm97lnx:9474/solr/rerweb5
>   65. http://vm97lnx:9474/solr/rerweb5
>   66. http://territorio.regione.emilia-romagna.it/codice-territorio/semplificazione-edilizia/non-rue/dm_9_5_2001.pdf/at_download/file/dm_9_5_2001.pdf
>   67. http://vm97lnx:9474/solr/rerweb5
>   68. http://vm97lnx:9474/solr/rerweb5
>   69. http://territorio.regione.emilia-romagna.it/codice-territorio/semplificazione-edilizia/non-rue/dm_9_5_2001.pdf/at_download/file/dm_9_5_2001.pdf
>   70. http://vm97lnx:9474/solr/rerweb5
>   71. http://vm97lnx:9474/solr/rerweb5
>   72. http://vm97lnx:9474/solr/rerweb5
>   73. http://vm97lnx:9474/solr/rerweb5
>   74. mailto:LBasso@regione.emilia-romagna.it
>   75. mailto:shinichiro.abe.1@gmail.com
>   76. mailto:user@manifoldcf.apache.org
>   77. mailto:user@manifoldcf.apache.org
>   78. mailto:daddywri@gmail.com
>   79. mailto:LBasso@regione.emilia-romagna.it
>   80. http://vm97lnx:9474/solr/rerweb5
>   81. http://www.regione.emilia-romagna.it/entra-in-regione/polo-archivistico-regionale/archivio-storico/per-approfondire/BolognaArchivioTerritoriale.rtf/at_download/file/BolognaArchivioTerritoriale.rtf
>   82. http://vm97lnx:9474/solr/rerweb5
>   83. http://vm97lnx:9474/solr/rerweb5
>   84. http://www.regione.emilia-romagna.it/entra-in-regione/polo-archivistico-regionale/archivio-storico/per-approfondire/BolognaArchivioTerritoriale.rtf/at_download/file/BolognaArchivioTerritoriale.rtf
>   85. http://vm97lnx:9474/solr/rerweb5
>   86. http://vm97lnx:9474/solr/rerweb5
>   87. http://vm97lnx:9474/solr/rerweb5
>   88. http://vm97lnx:9474/solr/rerweb5
>   89. http://territorio.regione.emilia-romagna.it/codice-territorio/semplificazione-edilizia/non-rue/dm_9_5_2001.pdf/at_download/file/dm_9_5_2001.pdf
>   90. http://vm97lnx:9474/solr/rerweb5
>   91. http://vm97lnx:9474/solr/rerweb5
>   92. http://territorio.regione.emilia-romagna.it/codice-territorio/semplificazione-edilizia/non-rue/dm_9_5_2001.pdf/at_download/file/dm_9_5_2001.pdf
>   93. http://vm97lnx:9474/solr/rerweb5
>   94. http://vm97lnx:9474/solr/rerweb5
>   95. http://vm97lnx:9474/solr/rerweb5
>   96. http://vm97lnx:9474/solr/rerweb5

Mime
View raw message