hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Reed (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-201) BufferedPositionedInputStream is not buffered
Date Thu, 10 Apr 2008 14:46:05 GMT

    [ https://issues.apache.org/jira/browse/PIG-201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12587658#action_12587658
] 

Benjamin Reed commented on PIG-201:
-----------------------------------

The InputStream we get from Hadoop DFS should be buffered, so we don't do extra buffering
in BufferedPositionedInputStream again. This is important because the buffering needs to be
done before the compression codecs so that the positioning works out properly. Doing it after,
like this patch does, will cause premature detection of end of split.

Having said all that, there obviously is a performance gain to be had. Perhaps we need to
figure out why the buffering done by Hadoop DFS InputStream isn't helping us. If we do need
to buffer, it should go into PigSlice.init() to buffer fsis.

> BufferedPositionedInputStream is not buffered
> ---------------------------------------------
>
>                 Key: PIG-201
>                 URL: https://issues.apache.org/jira/browse/PIG-201
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Mathieu Poumeyrol
>         Attachments: BufferedPositionedInputStream.patch
>
>
> BufferedPositionedInputStream is actualy not buffered, leading (I guess) to constant
round trip to dfs as byte are read one by one. I just wrapped the provided input stream in
the constructor in a good old BufferedInputStream.
> I measured a 40% performance boost on a script that reads and writes 3.7GB in dfs through
PigStorage on one node. I guess the impact may be greater on a real hdfs cluster with actual
network roundtrips.
> FYI, the issue was found while profiling with Yourkit java profiler. Usefull toy...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message