hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hong Tang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-778) Need a standalone JobHistory log anonymizer
Date Fri, 02 Apr 2010 09:10:27 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852756#action_12852756
] 

Hong Tang commented on MAPREDUCE-778:
-------------------------------------

The implementation on top of Rumen seems pretty straightforward. On the highest level, we
obviously need an interface called Anonymizable to all the LoggedXXX classes.

{code}
interface Anonymizable {
 void anonymize(TranslationTable table);
 void deanonymize(TranslationTable table);
}
{code}

and the rough definition of TranslationTable:

{code}
class TranslationTable {
 enum Type { HOST, RACK, JOB, USER, GROUP, PATH, QUEUE };
 EnumMap<Type, String> prefixes;

 static class Tablet {
   int seq;
   Map<String, String> fwdTbl;
   Map<String, String> revTbl;
 };

 EnumMap<Type, Tablet> tablets;

 String fwdTranslate(Type type, String val);
 String revTranslate(Type type, String val);
}
{code}

> Need a standalone JobHistory log anonymizer
> -------------------------------------------
>
>                 Key: MAPREDUCE-778
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-778
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Hong Tang
>         Attachments: anonymizer.py, same.py
>
>
> Job history logs contain a rich set of information that can help understand and characterize
cluster workload and individual job execution. Examples of work that parses or utilizes job
history include HADOOP-3585, MAPREDUCE-534, HDFS-459, MAPREDUCE-728, and MAPREDUCE-776. Some
of the parsing tools developed in previous work already contains a component to anonymize
the logs. It would be nice to combine these effort and have a common standalone tool that
can anonymizes job history logs and preserve much of the structure of the files so that existing
tools on top of job history logs continue work with no modification.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message