Return-Path: X-Original-To: apmail-hadoop-common-dev-archive@www.apache.org Delivered-To: apmail-hadoop-common-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2A0414D30 for ; Sun, 19 Jun 2011 08:21:11 +0000 (UTC) Received: (qmail 88300 invoked by uid 500); 19 Jun 2011 08:21:09 -0000 Delivered-To: apmail-hadoop-common-dev-archive@hadoop.apache.org Received: (qmail 88232 invoked by uid 500); 19 Jun 2011 08:21:08 -0000 Mailing-List: contact common-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-dev@hadoop.apache.org Delivered-To: mailing list common-dev@hadoop.apache.org Received: (qmail 88224 invoked by uid 99); 19 Jun 2011 08:21:08 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 19 Jun 2011 08:21:08 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 19 Jun 2011 08:21:07 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 5D4D0421E7E for ; Sun, 19 Jun 2011 08:20:47 +0000 (UTC) Date: Sun, 19 Jun 2011 08:20:47 +0000 (UTC) From: "Sunil Goyal (JIRA)" To: common-dev@hadoop.apache.org Message-ID: <56053413.19018.1308471647379.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Created] (HADOOP-7404) Data Blocks Spliting should be record oriented or provided option for give the spliting locations (offsets) as input file MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Data Blocks Spliting should be record oriented or provided option for give the spliting locations (offsets) as input file ------------------------------------------------------------------------------------------------------------------------- Key: HADOOP-7404 URL: https://issues.apache.org/jira/browse/HADOOP-7404 Project: Hadoop Common Issue Type: Improvement Reporter: Sunil Goyal Old Bug : https://issues.apache.org/jira/browse/HADOOP-106 It is difficult to do the padding in the existing records. Due to the following reason: 1. Records are having the different Size (some may be bytes, some may be GB) but in same file. 2. It is having the compatibility issues with the other standard tools. 3. It will increases the file size without any need of other tools (not working on hadoop). I think there should be option to this splitting process like this:- 1. File contains information of offsets where should be splitting done. (like 10,100,120, offset it). 2. Hadoop should do the splitting according to it ( 10-0 = 10, 100-10 =90 , etc). 3. This file can be generated easily from the other tools. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira