Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of qwertymaniac@gmail.com
 designates 74.125.82.176 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to
         :content-type:content-transfer-encoding;
        b=LXPPtit4Sxdrc1P9dFwF/EnIxT7qpRyOq/BgIJ6YNE1AqLrrf5zqBAMnM3ZHf2nr1h
         KMNVfffKH1ddoXJqoP+/MaM2+VJtPyF4hH+43Gv3/0iIMH3UDAXrQXFVSsFF7i9hWSxZ
         Qmrbjxcn5R8/OagZaj31uc6z+z9C+sfSIDq+0=
MIME-Version: 1.0
In-Reply-To: <4C6EE96E.70105@epfl.ch>
References: <4C6D0E4A.8010105@epfl.ch>
 <AANLkTi=O-9iF=9=0N=3eXzmgY-18C3aP+p7yztW4d8RB@mail.gmail.com>
 <4C6D65D9.4050305@epfl.ch> <BLU115-W6CB1758CC1FBC468D8D55D79F0@phx.gbl>
 <4C6E5A39.4020705@epfl.ch> <4C6EE96E.70105@epfl.ch>
From: Harsh J <qwertymaniac@gmail.com>
Date: Sat, 21 Aug 2010 02:32:14 +0530
Message-ID: <AANLkTi=f=WmoZWZgccpteOCkh-ZAUgVW_jS0ctHnZ9hs@mail.gmail.com>
Subject: Re: HDFS efficiently concat&split files
To: common-user@hadoop.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Appending one file to the other on HDFS itself would be the optimal
way to do this, but you can't avail the append support in
apache-hadoop-hdfs 0.20.x and previous releases. Its only available in
trunk / 0.21.

A similar question asked previously got some replies as well, you
might want to check it out:
http://www.mentby.com/Group/hadoop-core-user/concatenating-files-on-hdfs.ht=
ml

2010/8/21 Teodor Macicas <teodor.macicas@epfl.ch>:
> Hi again,
>
> Please, can anyone suggest me how to stick (concatenate) 2 HDFS files in
> one bigger file with less I/O time ?
>
> Thank you.
> Regards,
> Teodor
>
> On 08/20/2010 12:34 PM, Teodor Macicas wrote:
>> Hi,
>>
>> Basically, you are right. But in my case the second input is a
>> combination of previous outputs.
>> As I already told, I want only a certain amount of bytes to be the
>> second input. Hence, I need to split some files.
>>
>> Also, I need to concatenate the final reducers' outputs. Does anyone
>> know how to concatenate 2 files faster in HDFS ?
>>
>> Thank you.
>> Best,
>> Teodor
>>
>> On 08/20/2010 10:44 AM, xiujin yang wrote:
>>>
>>> Hi
>>>
>>> For mapred it is easy to realize the first job's output to be second jo=
b's input.
>>> =A0You just need to point out the path will be ok.
>>>
>>> Xiujinyang
>>>
>>>
>>>
>>>
>>>> Date: Thu, 19 Aug 2010 19:11:53 +0200
>>>> From: teodor.macicas@epfl.ch
>>>> To: common-user@hadoop.apache.org
>>>> Subject: Re: HDFS efficiently concat&split files
>>>>
>>>> Hello,
>>>>
>>>> I was expecting this question.
>>>> The reason is that I want to run 2 MR jobs on the same data in the
>>>> following manner: output of 1st job is collected and then I want to
>>>> create bins of a certain amount of bytes which will be the input for t=
he
>>>> next jobs. At the end I want to isolate each processed bin results [2n=
d
>>>> reducenrs outputs].
>>>>
>>>> Anyway, I do have a reason for wanting to do this.
>>>> Any ideas ?
>>>>
>>>> Thank you.
>>>> -Tedy
>>>>
>>>> On 08/19/2010 06:57 PM, Harsh J wrote:
>>>>> Hello,
>>>>>
>>>>> Why are you looking to concatenate or split files on the HDFS? Am jus=
t
>>>>> curious cause using directories as inputs and outputs works fine with
>>>>> Hadoop MR and HDFS, as the latter uses a block storage concept at its
>>>>> core.
>>>>>
>>>>> On Thu, Aug 19, 2010 at 4:28 PM, Teodor Macicas<teodor.macicas@epfl.c=
h> =A0wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> Does anyone know how to efficiently concatenate 2 different files in=
 HDFS,
>>>>>> as well as splitting a file into 2 different ones ?
>>>>>> I did this by read from a file, write to another one. Of course, thi=
s is
>>>>>> very slow, a lot of I/O time was spent. Being only a splitting or a =
putting
>>>>>> togheter job I am wondering if I can do this faster.
>>>>>>
>>>>>> Also, what can I do in oder to control a reducer output file size ? =
This
>>>>>> could be a solution of the previous question. If I would be able to =
do this,
>>>>>> further concats&splits are not neccessary.
>>>>>>
>>>>>> Thank you for your help.
>>>>>> Best,
>>>>>> Teodor
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
>


--=20
Harsh J
www.harshj.com