hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "eric baldeschwieler (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-106) Data blocks should be record-oriented.
Date Sat, 25 Mar 2006 22:39:19 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-106?page=comments#action_12371878 ] 

eric baldeschwieler commented on HADOOP-106:
--------------------------------------------

My intuition is it makes more sense to do this the other way around and have records aligned
to blocks.  This keeps the FS implementation trivial.  Just pad near the end of a block. 
This way you keep a good seperation of APIs too.  Fairly straight forward to change the record
model to do that.  Only issues are with huge records.  You have a couple of options there.
 The simplest is to disallow them...

> Data blocks should be record-oriented.
> --------------------------------------
>
>          Key: HADOOP-106
>          URL: http://issues.apache.org/jira/browse/HADOOP-106
>      Project: Hadoop
>         Type: Wish
>   Components: dfs
>     Versions: 0.2
>     Reporter: Andrzej Bialecki 

>
> If data blocks were starting and ending on data record boundaries, and not in random
places within a file, it would give some important advantages:
> * it would be possible to avoid "fishing" for the beginning of first record in a split
(see SequenceFile.Reader.sync()).
> * it would make recovering from DFS errors much more successful and easier - in most
cases missing blocks could be just skipped and the remaining parts combined together.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Mime
View raw message