lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Høydahl <>
Subject Re: Unsupported encoding GB18030
Date Mon, 04 Apr 2011 16:25:24 GMT
Makes sense.

Question is, do we want to require full JDK to index exampledocs? Most developers will have
a JDK, but the occasional semi-tech manager just wanting to test out Solr may get burnt and
think "Open Source sucks, just as I thought" :)

I added a note to about the need for JDK for international

Jan Høydahl, search solution architect
Cominvent AS -

On 4. apr. 2011, at 17.06, Uwe Schindler wrote:

> To come back to the original issue:
> If you are using a pure JRE installed in your operating system using the
> standard mechanism "browser automatically installs Java Plugin methods" or
> similar, the following applies:
> To reduce size of downloads, the JRE-only installation does not contain the
> full charsets.jar, so the problem is expected. In fact, those JRE's only
> contain the basic charsets as Robert told and the ones needed for your area
> (it analyzes your environment in the installer and chooses between western,
> eastern and possibly others to download only the corresponding
> charsets.jar).
> We should maybe add a note to Solr, that you should in all cases use a full
> locale JRE installation or better a JDK, else the full international
> functionality of Solr cannot be used.
> Uwe
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> eMail:
>> -----Original Message-----
>> From: Jan Høydahl []
>> Sent: Monday, April 04, 2011 1:37 PM
>> To:
>> Subject: Re: Unsupported encoding GB18030
>>>>> : I don't see the reason why "exampledocs" should contain docs with
>>>>> narrow charsets not guaranteed to be supported.
>>>> personally i would like to see us add a lot more exampledocs in a lot
>>>> more esoteric encodings, precisely to help end users sanity test this
>>>> sort of we frequetnly get questions form people about character
>>>> encoding wonkiness, and things like, utf8-example.xml,
>>>> and now gb18030-example.xml can help us narrow down the problem:
>>>> their client code, their servlet container, or solr?
>>> Same here. In my opinion, an example set of files should also contain
>>> "more complicated" ones to show what Solr can do. If some of them
>>> don't work, it's not really a problem. Maybe we should simply add a
>>> "tag" to the filename to mark them as not working in every
> configuration.
>> Positive to more example docs!
>> My concern was that since indexing exampledocs/*.xml is perhaps THE most
>> common action any new Solr user will do, it should just work, and it's a
>> benefit if the results revolve around the same theme, a set of products
> with
>> category and prices. We definitely want to show off more advanced
>> features, and we should add more example documents for that. Plain test
>> docs could be placed in a a subfolder "exampledocs/extras" or something.
>> Regarding the WindowsXP VMmware I was using, it had a Sun JRE (not JDK)
>> which was auto-updated from 1.5 to 1.6.
>> After completely uninstalling Java and re-installing jdk-6u24-windows-
>> i586.exe the GB18030 encoding is supported.
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS -
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: For additional
>> commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message