manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fuad Efendi" <f...@efendi.ca>
Subject RE: SOLR
Date Mon, 14 Mar 2011 22:24:09 GMT

Default settings for ManifoldCE: /update/extract
http://localhost:8080/solr/update/extract?commit=true

And using browser, I see SOLR responds with malformed HTML containing
non-closing <HR>...

Fix:
Update handler:  /update


-Fuad


-----Original Message-----
From: Fuad Efendi [mailto:fuad@efendi.ca] 
Sent: March-14-11 6:17 PM
To: connectors-user@incubator.apache.org
Subject: RE: SOLR

Hi Karl,

I verified (via browser),
http://localhost:8080/solr/update?commit=true

And response from SOLR:
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int
name="QTime">15</int></lst> </response>

The problem root is
org.apache.manifoldcf.agents.output.solr.HttpPoster$CommitThread.run(HttpPos
ter.java:1658)


Everything is fine except I can't understand why we have "HR" from SOLR, do
we have any multithreading issues? I believe I connect to SOLR, port 8080 is
configured via console... may be somewhere else?

I believe default setting for "Update handler:" at Connector screen is
incorrect, it is /update/extract




-----Original Message-----
From: Karl Wright [mailto:daddywri@gmail.com] 
Sent: March-14-11 6:00 PM
To: connectors-user@incubator.apache.org
Subject: Re: SOLR

This is because your solr setup is incorrect.  The post to "solr" is
returning HTML, not XML, so you are not actually communicating with Solr at
all.

In order for the Solr connector to work, you need to have the solr
extracting update request handler present and configured.  I am told that
the latest release of Solr makes the jar with this code optional
- it's a contrib jar that you have to separately download.  If you are
building solr off of trunk, then this should not be a problem.

Kalr

On Mon, Mar 14, 2011 at 5:40 PM, Fuad Efendi <fuad@efendi.ca> wrote:
> This exception, XML contains encoded HTML, and it doesn't happen with 
> standard Java 6 StAX parser:
>
> [Fatal Error] :124:120: The element type "HR" must be terminated by 
> the matching end-tag "</HR>".
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML parsing
> error: The element type "HR" must be terminated by the matching 
> end-tag "</HR>"
> .
>        at 
> org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:369)
>        at 
> org.apache.manifoldcf.core.common.XMLDoc.<init>(XMLDoc.java:317)
>        at
> org.apache.manifoldcf.agents.output.solr.HttpPoster.getResponse(HttpPo
> ster.j
> ava:619)
>        at
> org.apache.manifoldcf.agents.output.solr.HttpPoster$CommitThread.run(H
> ttpPos
> ter.java:1658)
> Caused by: org.xml.sax.SAXParseException: The element type "HR" must 
> be terminated by the matching end-tag "</HR>".
>        at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
>        at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown 
> Source)
>        at 
> javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:124)
>        at 
> org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:365)
>        ... 3 more
>
>
>
>
>
>
> -----Original Message-----
> From: Fuad Efendi [mailto:fuad@efendi.ca]
> Sent: March-14-11 5:37 PM
> To: connectors-user@incubator.apache.org
> Subject: RE: SOLR
>
> Thank you very much Karl,
>
> And I have first problem,
> Starting crawler...
> [Fatal Error] :124:120: The element type "HR" must be terminated by 
> the matching end-tag "</HR>".
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML parsing
> error: The element type "HR" must be terminated by the matching 
> end-tag "</HR>"
> .
>        at 
> org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:369)
>        at 
> org.apache.manifoldcf.core.common.XMLDoc.<init>(XMLDoc.java:317)
>
> I am using RSS connector to crawl specific XML (containing XML-encoded 
> &gt;HR&lt; and other HTML tags). It doesn't happened with standard 
> StAX parser (Java 6)...
>
>
> Regarding (2), do you mean this interface method?
>  /** View specification.
>  * This method is called in the body section of a job's view page.  
> Its purpose is to present the output specification information to the
user.
>  * The coder can presume that the HTML that is output from this 
> configuration will be within appropriate <html> and <body> tags.
>  *@param out is the output to which any HTML should be sent.
>  *@param os is the current output specification for this job.
>  */
>  public void viewSpecification(IHTTPOutput out, OutputSpecification 
> os)
>    throws ManifoldCFException, IOException
>
>
>
> Thanks!
>
>
>
>
>
> -----Original Message-----
> From: Karl Wright [mailto:daddywri@gmail.com]
> Sent: March-14-11 5:21 PM
> To: connectors-user@incubator.apache.org
> Subject: Re: SOLR
>
> Hi Fuad,
>
> (1) "Arguments" are indeed optional key/value pairs, which are sent to 
> solr as part of the URL.
> (2) ManifoldCF presents tabs for a job of three kinds: (a) tabs that 
> all jobs have; (b) tabs related to the repository connector's 
> management of the document specification information; and (c) tabs 
> related to the output connector's output specification information.
> The Solr output connector's output specification information includes 
> the metadata to solr mapping, so those tabs come from the Solr connector.
>
> Karl
>
>
> On Mon, Mar 14, 2011 at 4:51 PM, Fuad Efendi <fuad@efendi.ca> wrote:
>> Hi, any sample of how to use SOLR connector?
>>
>> http://incubator.apache.org/connectors/end-user-documentation.html#so
>> l
>> routputconnector
>>
>>
>>
>> Some questions:
>>
>>
>>
>> 1.       Argument. Is it optional key=value pairs which can be sent 
>> to SOLR as part of HTTP GET/POST request?
>>
>> 2.       I see code for “Connector”, and I see how to configure SOLR 
>> Output Connection. But how “Job” happens to know about <metadata> to 
>> <solr> mapping, is it generic (without dependency on SOLR)?
>>
>>
>>
>> Thanks,
>>
>> Fuad
>
>


Mime
View raw message