hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Harsh J (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-7404) Data Blocks Spliting should be record oriented or provided option for give the spliting locations (offsets) as input file
Date Sun, 19 Jun 2011 14:09:47 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-7404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051686#comment-13051686
] 

Harsh J commented on HADOOP-7404:
---------------------------------

Sunil,

Interesting points here I think :)

Some Qs:
How often do you have to face #1 in practice?
I do not get your "other tools" points, care to explain with an example of sorts?

> Data Blocks Spliting should be record oriented or provided option for give the spliting
locations (offsets) as input file
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-7404
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7404
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Sunil Goyal
>
> Old Bug :  https://issues.apache.org/jira/browse/HADOOP-106
> It is difficult to do the padding in the existing records. Due to the following reason:
> 1. Records are having the different Size (some may be bytes, some may be GB) but in same
file.
> 2. It is having the compatibility issues with the other standard tools.
> 3. It will increases the file size without any need of other tools (not working on hadoop).
> I think there should be option to this splitting process like this:-   
> 1. File contains information of offsets where should be splitting done. (like 10,100,120,
offset it).
> 2. Hadoop should do the splitting according to it ( 10-0 = 10, 100-10 =90 , etc).
> 3. This file can be generated easily from the other tools.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message