manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: SOLR
Date Mon, 14 Mar 2011 23:58:29 GMT
The trunk version of Solr may have changed around how the extracting
update request handler works.  It changes daily, so there is no way I
can keep up with it.  Maybe it would be better to go back and use a
known quantity.

Thanks,
Karl


On Mon, Mar 14, 2011 at 6:24 PM, Fuad Efendi <fuad@efendi.ca> wrote:
>
> Default settings for ManifoldCE: /update/extract
> http://localhost:8080/solr/update/extract?commit=true
>
> And using browser, I see SOLR responds with malformed HTML containing
> non-closing <HR>...
>
> Fix:
> Update handler:  /update
>
>
> -Fuad
>
>
> -----Original Message-----
> From: Fuad Efendi [mailto:fuad@efendi.ca]
> Sent: March-14-11 6:17 PM
> To: connectors-user@incubator.apache.org
> Subject: RE: SOLR
>
> Hi Karl,
>
> I verified (via browser),
> http://localhost:8080/solr/update?commit=true
>
> And response from SOLR:
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
> <lst name="responseHeader"><int name="status">0</int><int
> name="QTime">15</int></lst> </response>
>
> The problem root is
> org.apache.manifoldcf.agents.output.solr.HttpPoster$CommitThread.run(HttpPos
> ter.java:1658)
>
>
> Everything is fine except I can't understand why we have "HR" from SOLR, do
> we have any multithreading issues? I believe I connect to SOLR, port 8080 is
> configured via console... may be somewhere else?
>
> I believe default setting for "Update handler:" at Connector screen is
> incorrect, it is /update/extract
>
>
>
>
> -----Original Message-----
> From: Karl Wright [mailto:daddywri@gmail.com]
> Sent: March-14-11 6:00 PM
> To: connectors-user@incubator.apache.org
> Subject: Re: SOLR
>
> This is because your solr setup is incorrect.  The post to "solr" is
> returning HTML, not XML, so you are not actually communicating with Solr at
> all.
>
> In order for the Solr connector to work, you need to have the solr
> extracting update request handler present and configured.  I am told that
> the latest release of Solr makes the jar with this code optional
> - it's a contrib jar that you have to separately download.  If you are
> building solr off of trunk, then this should not be a problem.
>
> Kalr
>
> On Mon, Mar 14, 2011 at 5:40 PM, Fuad Efendi <fuad@efendi.ca> wrote:
>> This exception, XML contains encoded HTML, and it doesn't happen with
>> standard Java 6 StAX parser:
>>
>> [Fatal Error] :124:120: The element type "HR" must be terminated by
>> the matching end-tag "</HR>".
>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML parsing
>> error: The element type "HR" must be terminated by the matching
>> end-tag "</HR>"
>> .
>>        at
>> org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:369)
>>        at
>> org.apache.manifoldcf.core.common.XMLDoc.<init>(XMLDoc.java:317)
>>        at
>> org.apache.manifoldcf.agents.output.solr.HttpPoster.getResponse(HttpPo
>> ster.j
>> ava:619)
>>        at
>> org.apache.manifoldcf.agents.output.solr.HttpPoster$CommitThread.run(H
>> ttpPos
>> ter.java:1658)
>> Caused by: org.xml.sax.SAXParseException: The element type "HR" must
>> be terminated by the matching end-tag "</HR>".
>>        at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
>>        at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown
>> Source)
>>        at
>> javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:124)
>>        at
>> org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:365)
>>        ... 3 more
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Fuad Efendi [mailto:fuad@efendi.ca]
>> Sent: March-14-11 5:37 PM
>> To: connectors-user@incubator.apache.org
>> Subject: RE: SOLR
>>
>> Thank you very much Karl,
>>
>> And I have first problem,
>> Starting crawler...
>> [Fatal Error] :124:120: The element type "HR" must be terminated by
>> the matching end-tag "</HR>".
>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML parsing
>> error: The element type "HR" must be terminated by the matching
>> end-tag "</HR>"
>> .
>>        at
>> org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:369)
>>        at
>> org.apache.manifoldcf.core.common.XMLDoc.<init>(XMLDoc.java:317)
>>
>> I am using RSS connector to crawl specific XML (containing XML-encoded
>> &gt;HR&lt; and other HTML tags). It doesn't happened with standard
>> StAX parser (Java 6)...
>>
>>
>> Regarding (2), do you mean this interface method?
>>  /** View specification.
>>  * This method is called in the body section of a job's view page.
>> Its purpose is to present the output specification information to the
> user.
>>  * The coder can presume that the HTML that is output from this
>> configuration will be within appropriate <html> and <body> tags.
>>  *@param out is the output to which any HTML should be sent.
>>  *@param os is the current output specification for this job.
>>  */
>>  public void viewSpecification(IHTTPOutput out, OutputSpecification
>> os)
>>    throws ManifoldCFException, IOException
>>
>>
>>
>> Thanks!
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Karl Wright [mailto:daddywri@gmail.com]
>> Sent: March-14-11 5:21 PM
>> To: connectors-user@incubator.apache.org
>> Subject: Re: SOLR
>>
>> Hi Fuad,
>>
>> (1) "Arguments" are indeed optional key/value pairs, which are sent to
>> solr as part of the URL.
>> (2) ManifoldCF presents tabs for a job of three kinds: (a) tabs that
>> all jobs have; (b) tabs related to the repository connector's
>> management of the document specification information; and (c) tabs
>> related to the output connector's output specification information.
>> The Solr output connector's output specification information includes
>> the metadata to solr mapping, so those tabs come from the Solr connector.
>>
>> Karl
>>
>>
>> On Mon, Mar 14, 2011 at 4:51 PM, Fuad Efendi <fuad@efendi.ca> wrote:
>>> Hi, any sample of how to use SOLR connector?
>>>
>>> http://incubator.apache.org/connectors/end-user-documentation.html#so
>>> l
>>> routputconnector
>>>
>>>
>>>
>>> Some questions:
>>>
>>>
>>>
>>> 1.       Argument. Is it optional key=value pairs which can be sent
>>> to SOLR as part of HTTP GET/POST request?
>>>
>>> 2.       I see code for “Connector”, and I see how to configure SOLR
>>> Output Connection. But how “Job” happens to know about <metadata> to
>>> <solr> mapping, is it generic (without dependency on SOLR)?
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Fuad
>>
>>
>
>

Mime
View raw message