manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: SOLR
Date Mon, 14 Mar 2011 23:56:52 GMT
The Solr connector has been tested and debugged with a huge variety of
documents.  Your assessment that RSS is causing this problem is not
likely to be correct, since the RSS feed itself is not even ingested.

Once again, I think you've got your Solr misconfigured.  Can you try
using an older version?  Just plop Solr down with its default
configuration and see whether ManifoldCF can connect to it and index
documents.  If you can, then you have something misconfigured on your
solr instance and you can compare and contrast.  I recommend Solr 1.4.

Thanks,
Karl


On Mon, Mar 14, 2011 at 7:28 PM, Fuad Efendi <fuad@efendi.ca> wrote:
>
> I was playing with Yahoo RSS:
> http://rss.news.yahoo.com/rss/topstories
>
> It is extremely hard to debug what is inside OutputStream... I believe
> something had to be XML-escaped before submitting to SOLR (for instance, RSS
> may contain HTML-formatted snippets)...
>
> Any plans to use SOLRJ client library? SOLRJ can use (default) binary
> protocol (instead of XML). Much easier... but it has dependencies such as
> HttpClient etc...
>
>
>
> -Fuad
>
> -----Original Message-----
> From: Fuad Efendi [mailto:fuad@efendi.ca]
> Sent: March-14-11 7:06 PM
> To: connectors-user@incubator.apache.org
> Subject: RE: SOLR
>
> Hi Karl,
>
> I have new problem now, with "inject":
> expected closing '&gt;' after ENTITY declaration  at [row,col,system-id]:
> [81,5,&quot;http://www.w3.org/TR/html4/strict.dtd&quot;]
>
>
> Full response from SOLR:
>
> <html><head><title>Apache Tomcat/7.0.11 - Error report</title><style><!--H1
> {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;
> font-size:22px;} H2
> {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;fo
> nt-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif
> ;color:white;background-color:#525D76;font-size:14px;} BODY
> {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B
> {font-family:T ahoma,Arial,sans-serif;color:white;background-color:#525D76;}
> P
> {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:
> 12px;}A {c
> olor : black;}A.name {color : black;}HR {color : #525D76;}--></style>
> </head><body><h1>HTTP Status 400 - Unexpected character '-' (code 45)
in
> externa l DTD subset; expected closing '&gt;' after ENTITY declaration  at
> [row,col,system-id]:
> [81,5,&quot;http://www.w3.org/TR/html4/strict.dtd&quot;]
>  from [row,col {unknown-source}]: [1,1]</h1><HR size="1"
> noshade="noshade"><p><b>type</b> Status report</p><p><b>message</b>
> <u>Unexpected character '
> -' (code 45) in external DTD subset; expected closing '&gt;' after ENTITY
> declaration  at [row,col,system-id]:
> [81,5,&quot;http://www.w3.org/TR/html4/strict.dtd&quot;]
>  from [row,col {unknown-source}]: [1,1]</u></p><p><b>description</b>
<u>The
> request sent by the client was syntactically incorrect (Unexpected charact
> er '-' (code 45) in external DTD subset; expected closing '&gt;' after
> ENTITY declaration  at [row,col,system-id]:
> [81,5,&quot;http://www.w3.org/TR/html4/strict.dtd&quot;]
> 2011-03-14 19:03:12.786:INFO::Shutdown hook executing
> 2011-03-14 19:03:12.787:INFO::Stopped SocketConnector@0.0.0.0:8345
> 2011-03-14 19:03:12.970:INFO::Shutdown hook complete
>
>
> -----Original Message-----
> From: Fuad Efendi [mailto:fuad@efendi.ca]
> Sent: March-14-11 6:24 PM
> To: connectors-user@incubator.apache.org
> Subject: RE: SOLR
>
>
> Default settings for ManifoldCE: /update/extract
> http://localhost:8080/solr/update/extract?commit=true
>
> And using browser, I see SOLR responds with malformed HTML containing
> non-closing <HR>...
>
> Fix:
> Update handler:  /update
>
>
> -Fuad
>
>
>
>

Mime
View raw message