lucene-pylucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andi Vajda <va...@apache.org>
Subject Re: [VOTE] Release PyLucene 3.5.0
Date Thu, 08 Dec 2011 22:23:39 GMT

On Thu, 8 Dec 2011, Robert Muir wrote:

> On Wed, Dec 7, 2011 at 8:22 PM, Andi Vajda <vajda@apache.org> wrote:
>
>>> JavaError: java.lang.UnsupportedOperationException: This JRE does not have support
for Thai segmentation
>>>    Java stacktrace:
>>> java.lang.UnsupportedOperationException: This JRE does not have support for Thai
segmentation
>>>        at org.apache.lucene.analysis.th.ThaiWordFilter.<init>(ThaiWordFilter.java:85)
>>>        at org.apache.lucene.analysis.th.ThaiAnalyzer.createComponents(ThaiAnalyzer.java:64)
>>>        at org.apache.lucene.analysis.ReusableAnalyzerBase.tokenStream(ReusableAnalyzerBase.java:92)
>>>
>>
>> That's a Java error. Your JVM doesn't do Thai. I didn't know this was possible.
>>
>> A patch to silence this could be written and is welcome. Not a new issue and not
a release stopper, imho.
>>
>
> Hi Andi, I added this check (i think a few releases back) when I found
> out some JVMs such as IBM's don't return a real thai-wordbreaker for
> "th" Locale.
>
> It could also be that even a Sun/Oracle JRE doesn't have support for
> this (if its not the "international" version).
> http://www.oracle.com/technetwork/java/javase/locales-137662.html
>
> There is a public boolean constant available if you want to inspect
> that its working: ThaiWordFilter.DBBI_AVAILABLE:
>
>  /**
>   * True if the JRE supports a working dictionary-based breakiterator for Thai.
>   * If this is false, this filter will not work at all!
>   */
>  public static final boolean DBBI_AVAILABLE;
>
> In our unit tests for Thai we don't fail the test if this is false:
>    assumeTrue("JRE does not support Thai dictionary-based
> BreakIterator", ThaiWordFilter.DBBI_AVAILABLE);
>
> (though now that you brought it up, i see i missed adding this assume
> to one of our tests... thanks)

Thank you, Robert, this check is easy enough to add to the test.
Done in rev 1212171 !

Andi..
Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message