Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1E82291F8 for ; Mon, 6 Feb 2012 14:06:34 +0000 (UTC) Received: (qmail 4907 invoked by uid 500); 6 Feb 2012 14:06:30 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 4848 invoked by uid 500); 6 Feb 2012 14:06:29 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 4840 invoked by uid 99); 6 Feb 2012 14:06:29 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Feb 2012 14:06:29 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.214.48] (HELO mail-bk0-f48.google.com) (209.85.214.48) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Feb 2012 14:06:21 +0000 Received: by bkbzt19 with SMTP id zt19so6845553bkb.35 for ; Mon, 06 Feb 2012 06:06:00 -0800 (PST) MIME-Version: 1.0 Received: by 10.204.133.219 with SMTP id g27mr8544549bkt.47.1328537160556; Mon, 06 Feb 2012 06:06:00 -0800 (PST) Received: by 10.205.116.5 with HTTP; Mon, 6 Feb 2012 06:06:00 -0800 (PST) In-Reply-To: References: <1135097457-1328529191-cardhu_decombobulator_blackberry.rim.net-1311199983-@b18.c2.bise7.blackberry> Date: Mon, 6 Feb 2012 09:06:00 -0500 Message-ID: Subject: Re: Can I write to an compressed file which is located in hdfs? From: David Sinclair To: common-user@hadoop.apache.org Cc: bejoy.hadoop@gmail.com Content-Type: multipart/alternative; boundary=001517592f1c86a31804b84c2923 X-Virus-Checked: Checked by ClamAV on apache.org --001517592f1c86a31804b84c2923 Content-Type: text/plain; charset=GB2312 Content-Transfer-Encoding: quoted-printable Hi, You may want to have a look at the Flume project from Cloudera. I use it for writing data into HDFS. https://ccp.cloudera.com/display/SUPPORT/Downloads dave 2012/2/6 Xiaobin She > hi Bejoy , > > thank you for your reply. > > actually I have set up an test cluster which has one namenode/jobtracker > and two datanode/tasktracker, and I have make an test on this cluster. > > I fetch the log file of one of our modules from the log collector machine= s > by rsync, and then I use hive command line tool to load this log file int= o > the hive warehouse which simply copy the file from the local filesystem = to > hdfs. > > And I have run some analysis on these data with hive, all this run well. > > But now I want to avoid the fetch section which use rsync, and write the > logs into hdfs files directly from the servers which generate these logs. > > And it seems easy to do this job if the file locate in the hdfs is not > compressed. > > But how to write or append logs to an file that is compressed and located > in hdfs? > > Is this possible? > > Or is this an bad practice? > > Thanks! > > > > 2012/2/6 > > > Hi > > If you have log files enough to become at least one block size in a= n > > hour. You can go ahead as > > - run a scheduled job every hour that compresses the log files for that > > hour and stores them on to hdfs (can use LZO or even Snappy to compress= ) > > - if your hive does more frequent analysis on this data store it as > > PARTITIONED BY (Date,Hour) . While loading into hdfs also follow a > > directory - sub dir structure. Once data is in hdfs issue a Alter Table > Add > > Partition statement on corresponding hive table. > > -in Hive DDL use the appropriate Input format (Hive has some ApacheLog > > Input Format already) > > > > > > Regards > > Bejoy K S > > > > From handheld, Please excuse typos. > > > > -----Original Message----- > > From: Xiaobin She > > Date: Mon, 6 Feb 2012 16:41:50 > > To: ; =D9=DC=CF=FE=B1=F2 > > Reply-To: common-user@hadoop.apache.org > > Subject: Re: Can I write to an compressed file which is located in hdfs= ? > > > > sorry, this sentence is wrong, > > > > I can't compress these logs every hour and them put them into hdfs. > > > > it should be > > > > I can compress these logs every hour and them put them into hdfs. > > > > > > > > > > 2012/2/6 Xiaobin She > > > > > > > > hi all, > > > > > > I'm testing hadoop and hive, and I want to use them in log analysis. > > > > > > Here I have a question, can I write/append log to an compressed file > > > which is located in hdfs? > > > > > > Our system generate lots of log files every day, I can't compress the= se > > > logs every hour and them put them into hdfs. > > > > > > But what if I want to write logs into files that was already in the > hdfs > > > and was compressed? > > > > > > Is these files were not compressed, then this job seems easy, but how > to > > > write or append logs into an compressed log? > > > > > > Can I do that? > > > > > > Can anyone give me some advices or give me some examples? > > > > > > Thank you very much! > > > > > > xiaobin > > > > > > > > --001517592f1c86a31804b84c2923--