lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Walter Underwood <wunderw...@netflix.com>
Subject Re: wana use CJKAnalyzer
Date Mon, 25 Sep 2006 15:44:09 GMT
This document has two problems. First, the document is not well-formed XML.
Open it  in Firefox and you will see this error:

   XML Parsing Error: mismatched tag. Expected: </doc>.
   Location: file:///Users/wunderwood/Desktop/jl.xml
   Line Number 15, Column 3:

After I fix that, it still is not legal UTF-8.

Does Solr report parsing errors? It really should. Maybe a 400 Bad Request
response with a text/plain body showing the error message.

wunder


On 9/22/06 6:24 PM, "James liu" <liuping.james@gmail.com> wrote:
> 
> 2006/9/23, Walter Underwood <wunderwood@netflix.com>:
>> On 9/21/06 5:37 PM, "James liu" <liuping.james@gmail.com> wrote:
>> 
>>> > Yes,it working. the root of my problem is xml muse be encoded by utf-8.
>>> > if use php,it not about www browser. just notice that
>>> > curl header information must be utf-8.
>>> > if use post.sh,xml muse be encoded by utf-8.(my editplus default encode
>>> > style is ansi)
>> 
>> This might be a Solr bug. Solr should be able to accept XML in any
>> of the required encodings (ASCII, Latin 1, UTF-8, and UTF-16).
>> Getting XML content types exactly right is tricky, see RFC 3023.
>> 
>> What curl command line was used?
> 
> No sepcial curl command i use.just solr-nightly/example/exampledocs post.sh.
> but my jl.xml encoded  utf-8(i use editplus, i tried to use  xml encoding utf
> 8, but it is not effect).
> solrphp i use curl "$header=array("Content-Type:
> text/xml;charset=utf-8");curl_setopt($ch, CURLOPT_HTTPHEADER, $header);", this
> is php. 
> 
>> What encoding is the XML?
>> 
>> Can you give a sample XML file?
> 
> see attachments, anything you need mail me.
> 
>> wunder
>> --
>> Walter Underwood
>> Search Guru, Netflix
>> 
> 
> 



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message