Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 02ADCD966 for ; Sat, 22 Dec 2012 20:53:40 +0000 (UTC) Received: (qmail 88876 invoked by uid 500); 22 Dec 2012 20:53:35 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 88613 invoked by uid 500); 22 Dec 2012 20:53:35 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 88605 invoked by uid 99); 22 Dec 2012 20:53:35 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 22 Dec 2012 20:53:35 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of mohitanchlia@gmail.com designates 209.85.223.180 as permitted sender) Received: from [209.85.223.180] (HELO mail-ie0-f180.google.com) (209.85.223.180) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 22 Dec 2012 20:53:29 +0000 Received: by mail-ie0-f180.google.com with SMTP id c10so7863237ieb.11 for ; Sat, 22 Dec 2012 12:53:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=GTUt8IpdAk6JLmq+ru/uVhuXmVhkgYBjhZX7yfdckao=; b=HhDU0f5CZQ8a3hG5aYOFVvKUHOQDyO/RQn8zFCHsoDGLpHi1mmXjEtP7EOKFPqGbN3 RJy7JwqETZaVM9HEsO7eaHGyRqBg7bvTxLDFb4LFbqiN3AWGECR1erdV04E4ZVyHG3uL 5L6YZGH9HTr3TFhRe9X7+TCmdRgT4Lx8qwyM/DOGlxhCdbLd7MxJjZDV5lj5Qlz6cKL8 3uUFDoz1uGmksPdUfumFy8LDZIAmFkh41Gh+5ekD7tNf4WSf5TF37p62zvfa7oKIlcnn i4boxnBBTBrsgFE+OA4O8pVNsFOrDlg9PmQ9zW0Y0pr8BdURfmZwvazhKSXb7DvFe/QZ v/ew== MIME-Version: 1.0 Received: by 10.50.91.169 with SMTP id cf9mr16889666igb.44.1356209588550; Sat, 22 Dec 2012 12:53:08 -0800 (PST) Received: by 10.64.81.113 with HTTP; Sat, 22 Dec 2012 12:53:08 -0800 (PST) In-Reply-To: References: Date: Sat, 22 Dec 2012 12:53:08 -0800 Message-ID: Subject: Re: Merging files From: Mohit Anchlia To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=e89a8f3b9dadc43a4b04d1772609 X-Virus-Checked: Checked by ClamAV on apache.org --e89a8f3b9dadc43a4b04d1772609 Content-Type: text/plain; charset=ISO-8859-1 Tried distcp but it fails. Is there a way to merge them? Or else I could write a pig script to load from multiple paths org.apache.hadoop.tools.DistCp$DuplicationException: Invalid input, there are duplicated files in the sources: maprfs:/user/apuser/web-analytics/flume-output/2012/12/20/22/output/appinfo, maprfs:/user/apuser/web-analytics/flume-output/2012/12/21/00/output/appinfo at org.apache.hadoop.tools.DistCp.checkDuplication(DistCp.java:1419) at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1222) at org.apache.hadoop.tools.DistCp.copy(DistCp.java:675) at org.apache.hadoop.tools.DistCp.run(DistCp.java:910) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.tools.DistCp.main(DistCp.java:937) On Sat, Dec 22, 2012 at 11:24 AM, Ted Dunning wrote: > The technical term for this is "copying". You may have heard of it. > > It is a subject of such long technical standing that many do not consider > it worthy of detailed documentation. > > Distcp effects a similar process and can be modified to combine the input > files into a single file. > > http://hadoop.apache.org/docs/r1.0.4/distcp.html > > > On Sat, Dec 22, 2012 at 10:54 AM, Barak Yaish wrote: > >> Can you please attach HOW-TO links for the alternatives you mentioned? >> >> >> On Sat, Dec 22, 2012 at 10:46 AM, Harsh J wrote: >> >>> Yes, via the simple act of opening a target stream and writing all >>> source streams into it. Or to save code time, an identity job with a >>> single reducer (you may not get control over ordering this way). >>> >>> On Sat, Dec 22, 2012 at 12:10 PM, Mohit Anchlia >>> wrote: >>> > Is it possible to merge files from different locations from HDFS >>> location >>> > into one file into HDFS location? >>> >>> >>> >>> -- >>> Harsh J >>> >> >> > --e89a8f3b9dadc43a4b04d1772609 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Tried distcp but it fails. Is there a way to merge them? Or else I cou= ld write a pig script to load from multiple paths
=A0

org.apache.hadoop.tools.DistCp$DuplicationException: Invalid input, ther= e are duplicated files in the sources: maprfs:/user/apuser/web-analytics/fl= ume-output/2012/12/20/22/output/appinfo, maprfs:/user/apuser/web-analytics/= flume-output/2012/12/21/00/output/appinfo

at org.apache.hadoop.tools.DistCp.checkDuplication(DistCp.java:1419)

at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1222)

at org.apache.hadoop.tools.DistCp.copy(DistCp.java:675)

at org.apache.hadoop.tools.DistCp.run(DistCp.java:910)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)

at org.apache.hadoop.tools.DistCp.main(DistCp.java:937)


On Sat, Dec 22, 2012 at 11:24 AM, Ted Dunning <tdunning@maprtech.com> wrote:
The technical term for this is "= copying". =A0You may have heard of it.=20

It is a subject of such long technical standing that many do not consi= der it worthy of detailed documentation.

Distcp effects a similar process and can be modified to combine the in= put files into a single file.



On Sat, Dec 22, 2012 at 10:54 AM, Barak Yaish <barak.yaish@gmail.com> wrote:
Can you please attach HOW-TO links for the alternatives yo= u mentioned?=20


On Sat, Dec 22, 2012 at 10:46 AM, Harsh J <har= sh@cloudera.com> wrote:
Yes, via the simple act of opening a = target stream and writing all
source streams into it. Or to save code ti= me, an identity job with a
single reducer (you may not get control over ordering this way).

On Sat, Dec 22, 2012 at 12:10 PM, Mohit Anchlia <mohitanchlia@gmail.com> wrote:
&= gt; Is it possible to merge files from different locations from HDFS locati= on
> into one file into HDFS location?



--
Harsh J

=


--e89a8f3b9dadc43a4b04d1772609--