manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: SOLR
Date Tue, 15 Mar 2011 07:02:51 GMT
The trace indicates that the commit operation is failing with a
non-XML response, probably a 500 error with HTML.
You can see exactly what came back by using the "Simple History"
report; it should all be there.

Karl

On Mon, Mar 14, 2011 at 10:34 PM, Fuad Efendi <fuad@efendi.ca> wrote:
>
> It's not trunk version; I use (different) trunk versions in few production
> sites... in SOLR, path "/update" is defined in solrconfig.xml (and usually
> user will copy it from "example" schema and "may be" modify):
>
>  <requestHandler name="/update"
>                  class="solr.XmlUpdateRequestHandler">
>
>
> And, what ManifoldCF expects, which kind of "update" handler?!!
>
> That's why I suggest to use SOLRJ API instead... I noticed a lot of
> low-level coding...
>
>
>
> What kind of SOLR protocol is expected? It is definitely not POST of XML
> content:
>
>
>  /** Write a field */
>  protected static void writeField(OutputStream out, String fieldName,
> String fieldValue)
>    throws IOException
>  {
>    writePreamble(out);
>    writeBoundary(out,"text/plain; charset=UTF-8",fieldName,null);
>
>    byte[] tmp = fieldValue.getBytes("UTF-8");
>    out.write(tmp, 0, tmp.length);
>    writePostamble(out);
>  }
>
>
>
> Do you expect "binary" handler on SOLR?
>  <!-- Binary Update Request Handler
>       http://wiki.apache.org/solr/javabin
>    -->
>  <requestHandler name="/update/javabin"
>                  class="solr.BinaryUpdateRequestHandler" />
>
>
>
>
>
>
> -----Original Message-----
> From: Karl Wright [mailto:daddywri@gmail.com]
> Sent: March-14-11 7:58 PM
> To: connectors-user@incubator.apache.org
> Subject: Re: SOLR
>
> The trunk version of Solr may have changed around how the extracting update
> request handler works.  It changes daily, so there is no way I can keep up
> with it.  Maybe it would be better to go back and use a known quantity.
>
> Thanks,
> Karl
>
>
> On Mon, Mar 14, 2011 at 6:24 PM, Fuad Efendi <fuad@efendi.ca> wrote:
>>
>> Default settings for ManifoldCE: /update/extract
>> http://localhost:8080/solr/update/extract?commit=true
>>
>> And using browser, I see SOLR responds with malformed HTML containing
>> non-closing <HR>...
>>
>> Fix:
>> Update handler:  /update
>>
>>
>> -Fuad
>>
>>
>> -----Original Message-----
>> From: Fuad Efendi [mailto:fuad@efendi.ca]
>> Sent: March-14-11 6:17 PM
>> To: connectors-user@incubator.apache.org
>> Subject: RE: SOLR
>>
>> Hi Karl,
>>
>> I verified (via browser),
>> http://localhost:8080/solr/update?commit=true
>>
>> And response from SOLR:
>> <?xml version="1.0" encoding="UTF-8"?> <response> <lst
>> name="responseHeader"><int name="status">0</int><int
>> name="QTime">15</int></lst> </response>
>>
>> The problem root is
>> org.apache.manifoldcf.agents.output.solr.HttpPoster$CommitThread.run(H
>> ttpPos
>> ter.java:1658)
>>
>>
>> Everything is fine except I can't understand why we have "HR" from
>> SOLR, do we have any multithreading issues? I believe I connect to
>> SOLR, port 8080 is configured via console... may be somewhere else?
>>
>> I believe default setting for "Update handler:" at Connector screen is
>> incorrect, it is /update/extract
>>
>>
>>
>>
>> -----Original Message-----
>> From: Karl Wright [mailto:daddywri@gmail.com]
>> Sent: March-14-11 6:00 PM
>> To: connectors-user@incubator.apache.org
>> Subject: Re: SOLR
>>
>> This is because your solr setup is incorrect.  The post to "solr" is
>> returning HTML, not XML, so you are not actually communicating with
>> Solr at all.
>>
>> In order for the Solr connector to work, you need to have the solr
>> extracting update request handler present and configured.  I am told
>> that the latest release of Solr makes the jar with this code optional
>> - it's a contrib jar that you have to separately download.  If you are
>> building solr off of trunk, then this should not be a problem.
>>
>> Kalr
>>
>> On Mon, Mar 14, 2011 at 5:40 PM, Fuad Efendi <fuad@efendi.ca> wrote:
>>> This exception, XML contains encoded HTML, and it doesn't happen with
>>> standard Java 6 StAX parser:
>>>
>>> [Fatal Error] :124:120: The element type "HR" must be terminated by
>>> the matching end-tag "</HR>".
>>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML
>>> parsing
>>> error: The element type "HR" must be terminated by the matching
>>> end-tag "</HR>"
>>> .
>>>        at
>>> org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:369)
>>>        at
>>> org.apache.manifoldcf.core.common.XMLDoc.<init>(XMLDoc.java:317)
>>>        at
>>> org.apache.manifoldcf.agents.output.solr.HttpPoster.getResponse(HttpP
>>> o
>>> ster.j
>>> ava:619)
>>>        at
>>> org.apache.manifoldcf.agents.output.solr.HttpPoster$CommitThread.run(
>>> H
>>> ttpPos
>>> ter.java:1658)
>>> Caused by: org.xml.sax.SAXParseException: The element type "HR" must
>>> be terminated by the matching end-tag "</HR>".
>>>        at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
>>>        at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown
>>> Source)
>>>        at
>>> javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:124)
>>>        at
>>> org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:365)
>>>        ... 3 more
>>>
>>>
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Fuad Efendi [mailto:fuad@efendi.ca]
>>> Sent: March-14-11 5:37 PM
>>> To: connectors-user@incubator.apache.org
>>> Subject: RE: SOLR
>>>
>>> Thank you very much Karl,
>>>
>>> And I have first problem,
>>> Starting crawler...
>>> [Fatal Error] :124:120: The element type "HR" must be terminated by
>>> the matching end-tag "</HR>".
>>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML
>>> parsing
>>> error: The element type "HR" must be terminated by the matching
>>> end-tag "</HR>"
>>> .
>>>        at
>>> org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:369)
>>>        at
>>> org.apache.manifoldcf.core.common.XMLDoc.<init>(XMLDoc.java:317)
>>>
>>> I am using RSS connector to crawl specific XML (containing
>>> XML-encoded &gt;HR&lt; and other HTML tags). It doesn't happened with
>>> standard StAX parser (Java 6)...
>>>
>>>
>>> Regarding (2), do you mean this interface method?
>>>  /** View specification.
>>>  * This method is called in the body section of a job's view page.
>>> Its purpose is to present the output specification information to the
>> user.
>>>  * The coder can presume that the HTML that is output from this
>>> configuration will be within appropriate <html> and <body> tags.
>>>  *@param out is the output to which any HTML should be sent.
>>>  *@param os is the current output specification for this job.
>>>  */
>>>  public void viewSpecification(IHTTPOutput out, OutputSpecification
>>> os)
>>>    throws ManifoldCFException, IOException
>>>
>>>
>>>
>>> Thanks!
>>>
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Karl Wright [mailto:daddywri@gmail.com]
>>> Sent: March-14-11 5:21 PM
>>> To: connectors-user@incubator.apache.org
>>> Subject: Re: SOLR
>>>
>>> Hi Fuad,
>>>
>>> (1) "Arguments" are indeed optional key/value pairs, which are sent
>>> to solr as part of the URL.
>>> (2) ManifoldCF presents tabs for a job of three kinds: (a) tabs that
>>> all jobs have; (b) tabs related to the repository connector's
>>> management of the document specification information; and (c) tabs
>>> related to the output connector's output specification information.
>>> The Solr output connector's output specification information includes
>>> the metadata to solr mapping, so those tabs come from the Solr connector.
>>>
>>> Karl
>>>
>>>
>>> On Mon, Mar 14, 2011 at 4:51 PM, Fuad Efendi <fuad@efendi.ca> wrote:
>>>> Hi, any sample of how to use SOLR connector?
>>>>
>>>> http://incubator.apache.org/connectors/end-user-documentation.html#s
>>>> o
>>>> l
>>>> routputconnector
>>>>
>>>>
>>>>
>>>> Some questions:
>>>>
>>>>
>>>>
>>>> 1.       Argument. Is it optional key=value pairs which can be sent
>>>> to SOLR as part of HTTP GET/POST request?
>>>>
>>>> 2.       I see code for “Connector”, and I see how to configure
SOLR
>>>> Output Connection. But how “Job” happens to know about <metadata>
to
>>>> <solr> mapping, is it generic (without dependency on SOLR)?
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Fuad
>>>
>>>
>>
>>
>
>

Mime
View raw message