Return-Path: X-Original-To: apmail-flume-user-archive@www.apache.org Delivered-To: apmail-flume-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 28CD910BC3 for ; Wed, 31 Jul 2013 18:49:09 +0000 (UTC) Received: (qmail 86796 invoked by uid 500); 31 Jul 2013 18:49:08 -0000 Delivered-To: apmail-flume-user-archive@flume.apache.org Received: (qmail 86576 invoked by uid 500); 31 Jul 2013 18:49:03 -0000 Mailing-List: contact user-help@flume.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flume.apache.org Delivered-To: mailing list user@flume.apache.org Received: (qmail 86568 invoked by uid 99); 31 Jul 2013 18:49:02 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 31 Jul 2013 18:49:02 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of cwneal@gmail.com designates 209.85.212.41 as permitted sender) Received: from [209.85.212.41] (HELO mail-vb0-f41.google.com) (209.85.212.41) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 31 Jul 2013 18:48:55 +0000 Received: by mail-vb0-f41.google.com with SMTP id g17so1135702vbg.28 for ; Wed, 31 Jul 2013 11:48:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=8KIDtGMcii+VlDzVm5ydzvHaVTXeI2qH/1h0Xbi1h54=; b=CvUs1kd0heY28zcH4NydHHljiag+dr60XhBVXxGM8L2YoEJbuHDDfJj6Hl4qqepwda TDa8XxN08OAN7cihB6hcPMRNHfyEDfGOg31rUFhrNvVaR1IFxXNVLqJE/Z8fJ2JlKMHw 87n93IZkqwCWVtBjSnIc1+jBo3nDALMw+O7HrFEnS4rF2imnJyRyJEgoUM0b8NujdZsa IUAIeCvzU1HBSM+nTW/w8K6OLEWkJkQC7K0JmpruOssFas11sx1qcfYE9QT0leog3tmd VeyHa6Y/FcaPhcOkj61ey/Qmzz9adghsIk169j17r1qMtpFTOO8e1eE5SzlA/clxtlVl 2FCQ== MIME-Version: 1.0 X-Received: by 10.58.118.8 with SMTP id ki8mr28807461veb.84.1375296514384; Wed, 31 Jul 2013 11:48:34 -0700 (PDT) Received: by 10.58.215.165 with HTTP; Wed, 31 Jul 2013 11:48:34 -0700 (PDT) Date: Wed, 31 Jul 2013 13:48:34 -0500 Message-ID: Subject: ExecSource copy does not match original. Thoughts please? From: Chris Neal To: user@flume.apache.org Content-Type: multipart/alternative; boundary=089e0122a97c3370f204e2d32c73 X-Virus-Checked: Checked by ClamAV on apache.org --089e0122a97c3370f204e2d32c73 Content-Type: text/plain; charset=ISO-8859-1 Hi all. I have an ExecSource doing a tail -F on a log4J log file for an app, copying it into HDFS. I get no errors/warnings/exceptions from the Flume nodes, but when I went to make sure that indeed the contents of the files matched, I found that they did not. :( I tested several days worth of files, and none matched. I'm not sure where to even start looking at this discrepancy. Does anyone have any thoughts? If I would have come across some errors somewhere, I would understand some differences, but for everything to appear to work fine, and then not match up, that concerns me. Thank you very much for any input. Chris In HDFS from Flume, file size in lines: [root@hadoopnn01 ~]# time sudo -u hdfs hadoop fs -text /pegs/logs/udprodae01/d1c1_udprodae01/UD/UDTrans/2013-07-27/UDXMLTrans.log.* | wc -l 2812850 Actual source file size in lines: cneal@pegslog14[504]:/pegs/logcabin01/udprodae01/pegs/logs/udprodae01/d1c1_udprodae01/UD> time wc -l UDXMLTrans.log.2013-07-27 2812843 UDXMLTrans.log.2013-07-27 The source file: cneal@pegslog14[505]:/pegs/logcabin01/udprodae01/pegs/logs/udprodae01/d1c1_udprodae01/UD> ls -l UDXMLTrans.log.2013-07-27 -rw-r--r-- 1 logger other 19228787343 Jul 28 00:00 UDXMLTrans.log.2013-07-27 The files in HDFS: [root@hadoopnn01 ~]# time sudo -u hdfs hadoop fs -ls /pegs/logs/udprodae01/d1c1_udprodae01/UD/UDTrans/2013-07-27/UDXMLTrans.log.* Found 1 items -rw-r--r-- 3 flume supergroup 200021549 2013-07-28 00:00 /pegs/logs/udprodae01/d1c1_udprodae01/UD/UDTrans/2013-07-27/UDXMLTrans.log.2013-07-27_1.1374883211499.gz Found 1 items -rw-r--r-- 3 flume supergroup 195398211 2013-07-28 00:00 /pegs/logs/udprodae01/d1c1_udprodae01/UD/UDTrans/2013-07-27/UDXMLTrans.log.2013-07-27_10.1374883210982.gz Found 1 items -rw-r--r-- 3 root supergroup 193557330 2013-07-28 00:00 /pegs/logs/udprodae01/d1c1_udprodae01/UD/UDTrans/2013-07-27/UDXMLTrans.log.2013-07-27_13.1374883212709.gz Found 1 items -rw-r--r-- 3 root supergroup 194163091 2013-07-28 00:00 /pegs/logs/udprodae01/d1c1_udprodae01/UD/UDTrans/2013-07-27/UDXMLTrans.log.2013-07-27_14.1374883212712.gz Found 1 items -rw-r--r-- 3 flume supergroup 192546288 2013-07-28 00:00 /pegs/logs/udprodae01/d1c1_udprodae01/UD/UDTrans/2013-07-27/UDXMLTrans.log.2013-07-27_2.1374883211446.gz Found 1 items -rw-r--r-- 3 root supergroup 191863735 2013-07-28 00:00 /pegs/logs/udprodae01/d1c1_udprodae01/UD/UDTrans/2013-07-27/UDXMLTrans.log.2013-07-27_5.1374883208056.gz Found 1 items -rw-r--r-- 3 root supergroup 196733297 2013-07-28 00:00 /pegs/logs/udprodae01/d1c1_udprodae01/UD/UDTrans/2013-07-27/UDXMLTrans.log.2013-07-27_6.1374883208056.gz Found 1 items -rw-r--r-- 3 flume supergroup 193451845 2013-07-28 00:00 /pegs/logs/udprodae01/d1c1_udprodae01/UD/UDTrans/2013-07-27/UDXMLTrans.log.2013-07-27_9.1374883210989.gz --089e0122a97c3370f204e2d32c73 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hi all.

I have an ExecSource doing a ta= il -F on a log4J log file for an app, copying it into HDFS. =A0I get no err= ors/warnings/exceptions from the Flume nodes, but when I went to make sure = that indeed the contents of the files matched, I found that they did not. := ( =A0I tested several days worth of files, and none matched. =A0I'm not= sure where to even start looking at this discrepancy. Does anyone have any= thoughts? =A0

If I would have come across some errors somewhere, I wo= uld understand some differences, but for everything to appear to work fine,= and then not match up, that concerns me. =A0

Than= k you very much for any input.
Chris

In HDFS from Flume, file size in l= ines:
[root@hadoopnn01 ~]# time sudo -u hdfs hadoop fs -text= /pegs/logs/udprodae01/d1c1_udprodae01/UD/UDTrans/2013-07-27/UDXMLTrans.log= .* | wc -l

2812850

Actual source fi= le size in lines:
cneal@pegslog14[504]:/pegs/logcabin01/udpr= odae01/pegs/logs/udprodae01/d1c1_udprodae01/UD> time wc -l UDXMLTrans.lo= g.2013-07-27

=A02812843 UDXMLTrans.log.2013-07-27
<= div>
The source file:
cneal@pegslog14[505]:/pe= gs/logcabin01/udprodae01/pegs/logs/udprodae01/d1c1_udprodae01/UD> ls -l = UDXMLTrans.log.2013-07-27
-rw-r--r-- =A0 1 logger =A0 other =A0 =A019228787343 Jul 28 00:00 UDXM= LTrans.log.2013-07-27

The files in HDFS:
[root@hadoopnn01 ~]# time sudo -u hdfs hadoop fs -ls /pegs/logs= /udprodae01/d1c1_udprodae01/UD/UDTrans/2013-07-27/UDXMLTrans.log.*
Found 1 items
-rw-r--r-- =A0 3 flume supergroup =A0200021549= 2013-07-28 00:00 /pegs/logs/udprodae01/d1c1_udprodae01/UD/UDTrans/2013-07-= 27/UDXMLTrans.log.2013-07-27_1.1374883211499.gz
Found 1 items
-rw-r--r-- =A0 3 flume supergroup =A0195398211 2013-07-28 00:00 /pegs/= logs/udprodae01/d1c1_udprodae01/UD/UDTrans/2013-07-27/UDXMLTrans.log.2013-0= 7-27_10.1374883210982.gz
Found 1 items
-rw-r--r-- =A0 3= root =A0supergroup =A0193557330 2013-07-28 00:00 /pegs/logs/udprodae01/d1c= 1_udprodae01/UD/UDTrans/2013-07-27/UDXMLTrans.log.2013-07-27_13.13748832127= 09.gz
Found 1 items
-rw-r--r-- =A0 3 root =A0supergroup =A01941630= 91 2013-07-28 00:00 /pegs/logs/udprodae01/d1c1_udprodae01/UD/UDTrans/2013-0= 7-27/UDXMLTrans.log.2013-07-27_14.1374883212712.gz
Found 1 items<= /div>
-rw-r--r-- =A0 3 flume supergroup =A0192546288 2013-07-28 00:00 /pegs/= logs/udprodae01/d1c1_udprodae01/UD/UDTrans/2013-07-27/UDXMLTrans.log.2013-0= 7-27_2.1374883211446.gz
Found 1 items
-rw-r--r-- =A0 3 = root =A0supergroup =A0191863735 2013-07-28 00:00 /pegs/logs/udprodae01/d1c1= _udprodae01/UD/UDTrans/2013-07-27/UDXMLTrans.log.2013-07-27_5.1374883208056= .gz
Found 1 items
-rw-r--r-- =A0 3 root =A0supergroup =A01967332= 97 2013-07-28 00:00 /pegs/logs/udprodae01/d1c1_udprodae01/UD/UDTrans/2013-0= 7-27/UDXMLTrans.log.2013-07-27_6.1374883208056.gz
Found 1 items
-rw-r--r-- =A0 3 flume supergroup =A0193451845 2013-07-28 00:00 /pegs/= logs/udprodae01/d1c1_udprodae01/UD/UDTrans/2013-07-27/UDXMLTrans.log.2013-0= 7-27_9.1374883210989.gz



<= /div>

=A0

--089e0122a97c3370f204e2d32c73--