hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amar Kamat (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-2000) Rumen is not able to extract counters for Job history logs from Hadoop 0.20
Date Mon, 09 Aug 2010 11:39:16 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896510#action_12896510

Amar Kamat commented on MAPREDUCE-2000:

Using the regex makes it much cleaner. +1 for using regex. 

Few comments.
# Can you please add some comments as to what the regex is supposed to do? Comments for each
of the capturing groups w.r.t what are they planning to compare/match themselves against would
be good enough.
# Can we reuse the regex declared in o.a.h.mapred.JobHistory? Seems similar to me.
# In the testcase, you could define your values in unescaped format and use {{StringUtils}}
to escape it. This is how the framework does it. So here is how the testcase might look like

line-type=// something
value=val1 // special char content in unescaped format
line=line-type + space + key + equals + quotes + StringUtils.escape(value) + quotes + line-delim
ParsedLine pl = new ParsedLine(line, version)

// assert
newValue = pl.get(key)
unEscapeValue = StringUtils.unescape(newValue)
assertEquals(value, unEscapedValue)

See a sample testcase [here|http://pastebin.com/2Y19v29S].

> Rumen is not able to extract counters for Job history logs from Hadoop 0.20
> ---------------------------------------------------------------------------
>                 Key: MAPREDUCE-2000
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2000
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tools/rumen
>            Reporter: Hong Tang
>            Assignee: Hong Tang
>         Attachments: mr-2000-20100806.patch
> Rumen tries to match the end of a value string through indexOf("\""). It does not take
into account the case when an escaped '"' in the value string. This leads to the incorrect
parsing the remaining key=value properties in the same line.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message