hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peyman Mohajerian <mohaj...@gmail.com>
Subject Re: Multiple Part files
Date Thu, 17 Jul 2014 13:42:48 GMT
Hadoop has a getmerge command (
http://hadoop.apache.org/docs/r0.19.1/hdfs_shell.html#getmerge) command,
I'm not certain if it works with RC file, i think it should. So maybe you
don't have to copy the files to local.


On Thu, Jul 17, 2014 at 6:18 AM, Naganarasimha G R (Naga) <
garlanaganarasimha@huawei.com> wrote:

>  Hi Prabakaran,
>
>
>
>      Multiple small part files in the output directory is because each
> reducer task output is coming as one part file.
>
>    1. Do each part file will take 64MB block size?
>
> *                Based on the output size of the reducer one part file is
> created. Filesize can be smaller size than the hdfs block size, i.e. it not
> be mandatorily be of 64MB*
>
>
>
>         2. How to merge these multiple RC format part files into one RC
> file?
>
>                 *One way (may be longer way ) is to get the part files in
> to local diretory and  write a tool to merge all the RC files. *
>
> *                But anyway i feel in the first place we need to ensure we
> have single reducer so that there is no need for merging*
>
>
>
>         3.  What is the pros-cons of having multiple part files?
>
> *                Depends on the next operation what you want to do, *
>
> *                Like if you are planning to load into Hive then based on
> Hive paritions better to configure the MR  to be partitioned as per Hive
> partiions and loading would be easier? etc ... *
>
>
>
>         4.  Do merging part files will improve performance?
>
>                     Performance of the Map reduce or later operation ? I
> think if the overall scenario is known then we will be able to support
> better
>
>
>
>  Regards,
>
> Naga
>
>
>
> Huawei Technologies Co., Ltd.
> Phone:
> Fax:
> Mobile:  +91 9980040283
> Email: naganarasimhagr@huawei.com
> Huawei Technologies Co., Ltd.
> Bantian, Longgang District,Shenzhen 518129, P.R.China
> http://www.huawei.com
>
>  ยก
> This e-mail and its attachments contain confidential information from
> HUAWEI, which is intended only for the person or entity whose address is
> listed above. Any use of the information contained herein in any way
> (including, but not limited to, total or partial disclosure, reproduction,
> or dissemination) by persons other than the intended recipient(s) is
> prohibited. If you receive this e-mail in error, please notify the sender
> by phone or email immediately and delete it!
>   ------------------------------
> *From:* Natarajan, Prabakaran 1. (NSN - IN/Bangalore) [
> prabakaran.1.natarajan@nsn.com]
> *Sent:* Thursday, July 17, 2014 15:52
> *To:* user@hadoop.apache.org
> *Subject:* Multiple Part files
>
>   Hi
>
> After Map Reduce job, we are seeing multiple small part files in the
> output directory. We are using RC file format (snappy codec)
>
>
>    1. Do each part file will take 64MB block size?
>    2. How to merge these multiple RC format part files into one RC file?
>    3. What is the pros-cons of having multiple part files?
>    4. Do merging part files will improve performance?
>
>
> *Thanks and Regards*
> Prabakaran.N  aka NP
> nsn, Bangalore
> *When "I" is replaced by "We" - even Illness becomes "Wellness"*
>
>
>
>
>

Mime
View raw message