hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gelesh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
Date Mon, 04 Feb 2013 19:42:17 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13570567#comment-13570567
] 

Gelesh commented on MAPREDUCE-4974:
-----------------------------------

[~tlipcon]
nextKeyValue() is called as many number of times, the delimiter, or the new line has occurred,
with in a given split.
Each Time, it executes the below code,

-    if (key == null) {
-      key = new LongWritable();
-    }
-    key.set(pos);
-    if (value == null) {
-      value = new Text();
-    }

Only at the first iteration, the condition would hold true, and Key Value objects would be
created.
This could also be done, if we have Key & Value objects created at the initialize phase,
and we can skip this null check.

Also,
-    compressionCodecs = new CompressionCodecFactory(job);
-    codec = compressionCodecs.getCodec(file);
Need to be done , only when it uses a compressed input file. This change is also brought.

                
> Optimising the LineRecordReader initialize() method
> ---------------------------------------------------
>
>                 Key: MAPREDUCE-4974
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1, mrv2, performance
>    Affects Versions: 2.0.2-alpha, 0.23.5
>         Environment: Hadoop Linux
>            Reporter: Arun A K
>            Assignee: Gelesh
>              Labels: patch, performance
>             Fix For: 0.20.204.0, 0.24.0
>
>         Attachments: MAPREDUCE-4974.1.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs
& codec instantiated only if its a compressed input.
> Mean while Gelesh George Omathil, added if we could avoid the null check of key &
value. This would time save, since for every next key value generation, null check is done.
The intention being to instantiate only once and avoid NPE as well. Hope both could be met
if initialize key & value over  initialize() method. We both have worked on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message