hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-11907) Use the joni byte[] regex engine in place of j.u.regex
Date Tue, 23 Sep 2014 22:33:33 GMT

    [ https://issues.apache.org/jira/browse/HBASE-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14145552#comment-14145552

Andrew Purtell commented on HBASE-11907:

I've started working on this, modifying RegexStringComparator to use joni instead of j.u.regex.
There are a couple of differences which may or may not be a problem:
- Charset support: Fewer encodings are supported, though I think ones we care about are, and
it's possible to use Java charset names to refer to jcodings encodings. 
- Pattern flags: I've only implemented translations for CASE_INSENSITIVE and DOTALL. Some
of the pattern flags have no equivalents. It might be possible to emulate some, haven't looked
into it in detail. Or we could document the limitations.

Moving from j.u.regex to joni given these differences could be considered a breaking change,
which would be a shame, since some users might want this in 0.98.

> Use the joni byte[] regex engine in place of j.u.regex
> ------------------------------------------------------
>                 Key: HBASE-11907
>                 URL: https://issues.apache.org/jira/browse/HBASE-11907
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>             Fix For: 2.0.0, 0.98.7, 0.99.1
> The joni regex engine (https://github.com/jruby/joni), a Java port of Oniguruma regexp
library done by the JRuby project, is:
> - MIT licensed
> - Designed to work with byte[] arguments instead of String
> - Capable of handling UTF8 encoding
> - Regex syntax compatible
> - Interruptible
> - *About twice as fast as j.u.regex*
> - Has JRuby's jcodings library as a dependency, also MIT licensed

This message was sent by Atlassian JIRA

View raw message