hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1381) The distance between sync blocks in SequenceFiles should be configurable rather than hard coded to 2000 bytes
Date Wed, 16 May 2007 22:56:16 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12496429
] 

Doug Cutting commented on HADOOP-1381:
--------------------------------------

Why would this be better?  The current design is to add them as frequently as possible without
significantly impacting file size.  This minimizes the amount of data that must be scanned
when sync'ing.  What would making it larger help?

> The distance between sync blocks in SequenceFiles should be configurable rather than
hard coded to 2000 bytes
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1381
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1381
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: io
>            Reporter: Owen O'Malley
>             Fix For: 0.14.0
>
>
> Currently SequenceFiles put in sync blocks every 2000 bytes. It would be much better
if it was configurable with a much higher default (1mb or so?).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message