hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thejas M Nair (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
Date Wed, 02 Dec 2009 14:34:20 GMT

    [ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784849#action_12784849

Thejas M Nair commented on PIG-965:

In the above performance numbers, I assume optimization 2 (custom string comparison) is used
only for the regex ".*ABCD.*" , while optimization 1 (re-using compiled pattern) is used with
dk.brics.automaton as well. Can you please confirm ?

>From the performance numbers, it looks like we don't need to do optimization 2. We can
just use dk.brics.automaton for the common regexes as well and keep the pig code simpler.

> PERFORMANCE: optimize common case in matches (PORegex)
> ------------------------------------------------------
>                 Key: PIG-965
>                 URL: https://issues.apache.org/jira/browse/PIG-965
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>            Reporter: Thejas M Nair
>            Assignee: Ankit Modi
> Some frequently seen use cases of 'matches' comparison operator have follow properties
> 1. The rhs is a constant string . eg "c1 matches 'abc%' "
> 2. Regexes such that look for matching prefix , suffix etc are very common. eg - "abc%',
"%abc", '%abc%' 
> To optimize for these common cases , PORegex.java can be changed to -
> 1. Compile the pattern (rhs of matches) re-use it if the pattern string has not changed.

> 2. Use string comparisons for simple common regexes (in 2 above).
> The implementation of Hive like clause uses similar optimizations.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message