Return-Path: X-Original-To: apmail-flume-user-archive@www.apache.org Delivered-To: apmail-flume-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9FB5810500 for ; Wed, 8 May 2013 17:43:01 +0000 (UTC) Received: (qmail 20393 invoked by uid 500); 8 May 2013 17:43:01 -0000 Delivered-To: apmail-flume-user-archive@flume.apache.org Received: (qmail 20361 invoked by uid 500); 8 May 2013 17:43:01 -0000 Mailing-List: contact user-help@flume.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flume.apache.org Delivered-To: mailing list user@flume.apache.org Received: (qmail 20353 invoked by uid 99); 8 May 2013 17:43:01 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 May 2013 17:43:01 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of matt@nextdoor.com designates 209.85.210.50 as permitted sender) Received: from [209.85.210.50] (HELO mail-da0-f50.google.com) (209.85.210.50) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 May 2013 17:42:53 +0000 Received: by mail-da0-f50.google.com with SMTP id i23so1101355dad.9 for ; Wed, 08 May 2013 10:42:32 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:from:content-type:subject:message-id:date:to :mime-version:x-mailer:x-gm-message-state; bh=/JemC/hA7sdJxS4vxs3By44hqx0ijS53Xn1vMx/QJsE=; b=TiIlbkyWgx5LS95Z6lAv9xL6MxhcMNRx+K6Ys64woUZkTl3Ent99fYmjIYZLpYxZ43 zNn1NvkSJiX6bWyKJLwgg57NXwQR4DYFU9W3Ykr6Z7lToRoQAx/JTBA4aIq+aWewt7Oz tFF3oCr3nish0Q/XvzSpmA+BJ72eJ3lyl9mWq6OVdBMkjBFwOyWT2HT5tUpriY6aQoJb lja14QuZTxVgFuHuW+MYdpTFASAzsf+v+1yL7LW0LxCUbciIzuRGdD6x/exo+3enp3v+ EsfpuWa3bTc1YPcp+AM6fgRiGCFeN/gAit5+eRAt1kLhB5lkiw8vve4kZg04x7IuNpm5 vYEg== X-Received: by 10.66.233.130 with SMTP id tw2mr9114480pac.65.1368034951931; Wed, 08 May 2013 10:42:31 -0700 (PDT) Received: from [10.0.1.191] ([207.86.65.86]) by mx.google.com with ESMTPSA id aj6sm33436447pbd.14.2013.05.08.10.42.29 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 08 May 2013 10:42:30 -0700 (PDT) From: Matt Wise Content-Type: multipart/alternative; boundary="Apple-Mail=_33C7E30D-DF0C-4F20-8CE7-AC1808F3EA44" Subject: =?windows-1252?Q?Flume_1=2E3=2E0_+_HDFS_Sink_+_S3N_+_avro=5Fvent?= =?windows-1252?Q?_+_Hive=85=3F?= Message-Id: Date: Wed, 8 May 2013 10:42:28 -0700 To: "user@flume.apache.org" Mime-Version: 1.0 (Mac OS X Mail 6.3 \(1503\)) X-Mailer: Apple Mail (2.1503) X-Gm-Message-State: ALoCoQkekKFKtp8kpUbqI7uPVUyW7f6zpxlaOjpeqaRDHMQQhPadxQHVexZLMHCHuP3E+3Z1CJE7 X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_33C7E30D-DF0C-4F20-8CE7-AC1808F3EA44 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii We're still working on getting our POC of Flume up and running... right = now we have log events that pass through our Flume nodes via a Syslog = input and are happily sent off to ElasticSearch for indexing. We're also = sending these events to S3, but we're finding that they seem to be = unreadable with the avro tools. > # S3 Output Sink > agent.sinks.s3.type =3D hdfs > agent.sinks.s3.channel =3D fc1 > agent.sinks.s3.hdfs.path =3D = s3n://XXX:XXX@our_bucket/flume/events/%y-%m-%d/%H > agent.sinks.s3.hdfs.rollInterval =3D 600 > agent.sinks.s3.hdfs.rollSize =3D 0 > agent.sinks.s3.hdfs.rollCount =3D 10000 > agent.sinks.s3.hdfs.batchSize =3D 10000 > agent.sinks.s3.hdfs.serializer =3D avro_event > agent.sinks.s3.hdfs.fileType =3D SequenceFile > agent.sinks.s3.hdfs.timeZone =3D UTC When we try to look at the avro-serialized files, we get this error: > [localhost avro]$ java -jar avro-tools-1.7.4.jar getschema = FlumeData.1367857371493 > Exception in thread "main" java.io.IOException: Not a data file. > at = org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105) > at = org.apache.avro.file.DataFileReader.(DataFileReader.java:97) > at = org.apache.avro.file.DataFileReader.(DataFileReader.java:89) > at = org.apache.avro.tool.DataFileGetSchemaTool.run(DataFileGetSchemaTool.java:= 48) > at org.apache.avro.tool.Main.run(Main.java:80) > at org.apache.avro.tool.Main.main(Main.java:69) At this point we're a bit unclear how we're supposed to use these = FlumeData files with normal Avro tools? --Matt= --Apple-Mail=_33C7E30D-DF0C-4F20-8CE7-AC1808F3EA44 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii We're = still working on getting our POC of Flume up and running... right now we = have log events that pass through our Flume nodes via a Syslog input and = are happily sent off to ElasticSearch for indexing. We're also sending = these events to S3, but we're finding that they seem to be unreadable = with the avro tools.

# S3 = Output Sink
agent.sinks.s3.type= =3D hdfs
agent.sinks.s3.channel =3D= fc1
agent.sinks.s3.hdfs.path = =3D s3n://XXX:XXX@o= ur_bucket/flume/events/%y-%m-%d/%H
agent.sinks.s3.hdfs.rollInterval =3D = 600
agent.sinks.s3.hdfs.rollSize =3D= 0
agent.sinks.s3.hdfs.rollCount =3D= 10000
agent.sinks.s3.hdfs.batchSize= =3D 10000
agent.sinks.s3.hdfs.serializer =3D = avro_event
agent.sinks.s3.hdfs.fileType =3D = SequenceFile
agent.sinks.s3.hdfs.timeZone =3D = UTC

When we try to look at = the avro-serialized files, we get this = error:

[localhost avro]$ java = -jar avro-tools-1.7.4.jar getschema = FlumeData.1367857371493
Exception in thread "main" java.io.IOException: = Not a data file.
        at = org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105)