lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Unsupported encoding GB18030
Date Mon, 04 Apr 2011 16:52:02 GMT
>From my tests, this only affects Windows XP and previous.

*Nix and OSX use always full charset.jar. Windows Vista and Windows 7 by
default "support" all languages and report this back through
http://msdn.microsoft.com/en-us/library/dd317827(v=vs.85).aspx , so the
"testing code" in the installer gets back true for all language groups and
is forced to install full charsets.jar. This is described in the Sun issue
and I verified that at least on Vista and 7 Ultimate - it seems to install
full language support even on German Windows - in contrast to XP which
installs no charsets.jar (jre/lib folder).

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Jan Høydahl [mailto:jan.asf@cominvent.com]
> Sent: Monday, April 04, 2011 6:25 PM
> To: dev@lucene.apache.org
> Subject: Re: Unsupported encoding GB18030
> 
> Makes sense.
> 
> Question is, do we want to require full JDK to index exampledocs? Most
> developers will have a JDK, but the occasional semi-tech manager just
> wanting to test out Solr may get burnt and think "Open Source sucks, just
as I
> thought" :)
> 
> I added a note to http://wiki.apache.org/solr/SolrInstall about the need
for
> JDK for international charsets..
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> 
> On 4. apr. 2011, at 17.06, Uwe Schindler wrote:
> 
> > To come back to the original issue:
> > If you are using a pure JRE installed in your operating system using
> > the standard mechanism "browser automatically installs Java Plugin
> > methods" or similar, the following applies:
> > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6329080
> >
> > To reduce size of downloads, the JRE-only installation does not
> > contain the full charsets.jar, so the problem is expected. In fact,
> > those JRE's only contain the basic charsets as Robert told and the
> > ones needed for your area (it analyzes your environment in the
> > installer and chooses between western, eastern and possibly others to
> > download only the corresponding charsets.jar).
> >
> > We should maybe add a note to Solr, that you should in all cases use a
> > full locale JRE installation or better a JDK, else the full
> > international functionality of Solr cannot be used.
> >
> > Uwe
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> >
> >> -----Original Message-----
> >> From: Jan Høydahl [mailto:jan.asf@cominvent.com]
> >> Sent: Monday, April 04, 2011 1:37 PM
> >> To: dev@lucene.apache.org
> >> Subject: Re: Unsupported encoding GB18030
> >>
> >>>>> : I don't see the reason why "exampledocs" should contain docs
> >>>>> with narrow charsets not guaranteed to be supported.
> >>>> personally i would like to see us add a lot more exampledocs in a
> >>>> lot more esoteric encodings, precisely to help end users sanity
> >>>> test this sort of we frequetnly get questions form people about
> >>>> character encoding wonkiness, and things like test_utf8.sh,
> >>>> utf8-example.xml, and now gb18030-example.xml can help us narrow
> down the problem:
> >>>> their client code, their servlet container, or solr?
> >>>
> >>> Same here. In my opinion, an example set of files should also
> >>> contain "more complicated" ones to show what Solr can do. If some of
> >>> them don't work, it's not really a problem. Maybe we should simply
> >>> add a "tag" to the filename to mark them as not working in every
> > configuration.
> >>
> >> Positive to more example docs!
> >>
> >> My concern was that since indexing exampledocs/*.xml is perhaps THE
> >> most common action any new Solr user will do, it should just work,
> >> and it's a benefit if the results revolve around the same theme, a
> >> set of products
> > with
> >> category and prices. We definitely want to show off more advanced
> >> features, and we should add more example documents for that. Plain
> >> test docs could be placed in a a subfolder "exampledocs/extras" or
> something.
> >>
> >> Regarding the WindowsXP VMmware I was using, it had a Sun JRE (not
> >> JDK) which was auto-updated from 1.5 to 1.6.
> >> After completely uninstalling Java and re-installing
> >> jdk-6u24-windows- i586.exe the GB18030 encoding is supported.
> >>
> >> --
> >> Jan Høydahl, search solution architect Cominvent AS -
> >> www.cominvent.com
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
> >> additional commands, e-mail: dev-help@lucene.apache.org
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
> > additional commands, e-mail: dev-help@lucene.apache.org
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional
> commands, e-mail: dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message