hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2608) Reading sequence file consumes 100% cpu with maximum throughput being about 5MB/sec per process
Date Thu, 17 Jan 2008 19:07:34 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12560033#action_12560033
] 

Doug Cutting commented on HADOOP-2608:
--------------------------------------

We might also look to see whether org.apache.hadoop.record.Utils.fromBinaryString could be
made any faster.  What happens if this just does 'new String(bytes, "UTF-8")'?  Is the problem
our homegrown UTF-8 decoder, or UTF-8 decoding in general?  It'd be nice to return org.apache.io.Text
instead, since that permits many string operations w/o decoding UTF-8, but that'd be a bigger
change.


> Reading sequence file consumes 100% cpu with maximum throughput being about 5MB/sec per
process
> -----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2608
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2608
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: io
>            Reporter: Runping Qi
>
> I did some tests on the throughput of scanning block-compressed sequence files.
> The sustained throughput was bounded at 5MB/sec per process, with the cpu of each process
maxed at 100%.
> It seems to me that the cpu consumption is too high and the throughput is too low for
just scanning files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message