lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Høydahl <jan....@cominvent.com>
Subject Re: Unsupported encoding GB18030
Date Mon, 04 Apr 2011 16:25:24 GMT
Makes sense.

Question is, do we want to require full JDK to index exampledocs? Most developers will have
a JDK, but the occasional semi-tech manager just wanting to test out Solr may get burnt and
think "Open Source sucks, just as I thought" :)

I added a note to http://wiki.apache.org/solr/SolrInstall about the need for JDK for international
charsets..

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 4. apr. 2011, at 17.06, Uwe Schindler wrote:

> To come back to the original issue:
> If you are using a pure JRE installed in your operating system using the
> standard mechanism "browser automatically installs Java Plugin methods" or
> similar, the following applies:
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6329080
> 
> To reduce size of downloads, the JRE-only installation does not contain the
> full charsets.jar, so the problem is expected. In fact, those JRE's only
> contain the basic charsets as Robert told and the ones needed for your area
> (it analyzes your environment in the installer and chooses between western,
> eastern and possibly others to download only the corresponding
> charsets.jar).
> 
> We should maybe add a note to Solr, that you should in all cases use a full
> locale JRE installation or better a JDK, else the full international
> functionality of Solr cannot be used.
> 
> Uwe
> 
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
> 
> 
>> -----Original Message-----
>> From: Jan Høydahl [mailto:jan.asf@cominvent.com]
>> Sent: Monday, April 04, 2011 1:37 PM
>> To: dev@lucene.apache.org
>> Subject: Re: Unsupported encoding GB18030
>> 
>>>>> : I don't see the reason why "exampledocs" should contain docs with
>>>>> narrow charsets not guaranteed to be supported.
>>>> personally i would like to see us add a lot more exampledocs in a lot
>>>> more esoteric encodings, precisely to help end users sanity test this
>>>> sort of we frequetnly get questions form people about character
>>>> encoding wonkiness, and things like test_utf8.sh, utf8-example.xml,
>>>> and now gb18030-example.xml can help us narrow down the problem:
>>>> their client code, their servlet container, or solr?
>>> 
>>> Same here. In my opinion, an example set of files should also contain
>>> "more complicated" ones to show what Solr can do. If some of them
>>> don't work, it's not really a problem. Maybe we should simply add a
>>> "tag" to the filename to mark them as not working in every
> configuration.
>> 
>> Positive to more example docs!
>> 
>> My concern was that since indexing exampledocs/*.xml is perhaps THE most
>> common action any new Solr user will do, it should just work, and it's a
>> benefit if the results revolve around the same theme, a set of products
> with
>> category and prices. We definitely want to show off more advanced
>> features, and we should add more example documents for that. Plain test
>> docs could be placed in a a subfolder "exampledocs/extras" or something.
>> 
>> Regarding the WindowsXP VMmware I was using, it had a Sun JRE (not JDK)
>> which was auto-updated from 1.5 to 1.6.
>> After completely uninstalling Java and re-installing jdk-6u24-windows-
>> i586.exe the GB18030 encoding is supported.
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional
>> commands, e-mail: dev-help@lucene.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message