lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Musshorn, Kris T CTR USARMY RDECOM ARL (US)" <kris.t.musshorn....@mail.mil>
Subject SimplePostTool error (UNCLASSIFIED)
Date Fri, 15 Jul 2016 16:01:14 GMT
CLASSIFICATION: UNCLASSIFIED

How do I correct this error when running the simple post tool against a website?
The tool successfully indexed for about 30 mins before throwing this error and terminating.

[Fatal Error] :642:15: XML document structures must start and end within the same entity.
Exception in thread "main" java.lang.RuntimeException: org.xml.sax.SAXParseException; lineNumber:
642; columnNumber: 15; XML document structures must start and end within the same entity.
        at org.apache.solr.util.SimplePostTool$PageFetcher.getLinksFromWebPage(SimplePostTool.java:1219)
        at org.apache.solr.util.SimplePostTool.webCrawl(SimplePostTool.java:601)
        at org.apache.solr.util.SimplePostTool.webCrawl(SimplePostTool.java:618)
        at org.apache.solr.util.SimplePostTool.webCrawl(SimplePostTool.java:618)
        at org.apache.solr.util.SimplePostTool.webCrawl(SimplePostTool.java:618)
        at org.apache.solr.util.SimplePostTool.webCrawl(SimplePostTool.java:618)
        at org.apache.solr.util.SimplePostTool.postWebPages(SimplePostTool.java:548)
        at org.apache.solr.util.SimplePostTool.doWebMode(SimplePostTool.java:351)
        at org.apache.solr.util.SimplePostTool.execute(SimplePostTool.java:182)
        at org.apache.solr.util.SimplePostTool.main(SimplePostTool.java:167)
Caused by: org.xml.sax.SAXParseException; lineNumber: 642; columnNumber: 15; XML document
structures must start and end within the same entity.
        at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:257)
        at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:339)
        at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
        at org.apache.solr.util.SimplePostTool.makeDom(SimplePostTool.java:1028)
        at org.apache.solr.util.SimplePostTool$PageFetcher.getLinksFromWebPage(SimplePostTool.java:1201)
        ... 9 more

Thanks,
Kris

~~~~~~~~~~~~~~~~~~~~~~~~~~
Kris T. Musshorn
FileMaker Developer - Contractor - Catapult Technology Inc.      
US Army Research Lab 
Aberdeen Proving Ground 
Application Management & Development Branch 
410-278-7251
kris.t.musshorn.ctr@mail.mil
~~~~~~~~~~~~~~~~~~~~~~~~~~



CLASSIFICATION: UNCLASSIFIED
Mime
View raw message