hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Sichi (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-1040) use sed rather than diff for masking out noise in diff-based tests
Date Wed, 07 Dec 2011 00:34:40 GMT

    [ https://issues.apache.org/jira/browse/HIVE-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13164002#comment-13164002
] 

John Sichi commented on HIVE-1040:
----------------------------------

Regarding the masking implementation, the while loop is worst-case O(n^2), and requires the
entire file to be loaded into memory. It would be reasonable to instead stream the file line-by-line,
since we never do multi-line replacements.
                
> use sed rather than diff for masking out noise in diff-based tests
> ------------------------------------------------------------------
>
>                 Key: HIVE-1040
>                 URL: https://issues.apache.org/jira/browse/HIVE-1040
>             Project: Hive
>          Issue Type: Improvement
>          Components: Testing Infrastructure
>    Affects Versions: 0.4.1
>            Reporter: John Sichi
>            Assignee: Marek Sapota
>            Priority: Minor
>         Attachments: HIVE-1040-code-patch.patch, HIVE-1040.1.patch, HIVE-1040.2.patch,
HIVE-1040.D597.1.patch, HIVE-1040.D597.2.patch
>
>
> The current diff -I approach has two problems:  (1) it does not allow resolution finer
than line-level, so it's impossible to mask out pattern occurrences within a line, and (2)
it produces unmasked files, so if you run diff on the command line to compare the result .q.out
with the checked-in file, you see the noise.
> My suggestion is to first run sed to replace noise patterns with an unlikely-to-occur
string like ZYZZYZVA, and then diff the pre-masked files without using any -I.
> This would require a one-time hit to update all existing .q.out files so that they would
contain the pre-masked results.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message