lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shalin Shekhar Mangar <shalinman...@gmail.com>
Subject Re: Trouble Indexing HTML Files
Date Fri, 11 Sep 2009 08:08:44 GMT
On Fri, Sep 11, 2009 at 1:22 AM, Daniel Cohen <
daniel.michael.cohen@gmail.com> wrote:

> *HI there-**
> *
> *I'm trying to get the dataimporthandler working to recursively parse the
> content of a root directory, which contain several other directories
> beneath
> it... The indexing seems to encounter errors ith the doctype tag in my
> source files.*
> *
> *Stack trace:*
>
> ava.lang.RuntimeException: com.ctc.wstx.exc.WstxIOException: Server
> returned
> HTTP response code: 503 for URL:
> http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd
>  at
>
> org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:85)
> at
>
> org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:226)
>  at
>

In trunk DataImportHandler ignores DTD [1]. If you are using Solr 1.3, then
unfortunately there is no workaround except removing the dtd declarations
from the files before indexing through DIH.

[1] - See https://issues.apache.org/jira/browse/SOLR-964

-- 
Regards,
Shalin Shekhar Mangar.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message