hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiaobin She <xiaobin...@gmail.com>
Subject Re: Can I write to an compressed file which is located in hdfs?
Date Mon, 06 Feb 2012 12:11:18 GMT
hi Bejoy ,

thank you for your reply.

actually I have set up an test cluster which has one namenode/jobtracker
and two datanode/tasktracker, and I have make an test on this cluster.

I fetch the log file of one of our modules from the log collector machines
by rsync, and then I use hive command line tool to load this log file into
the hive warehouse which  simply copy the file from the local filesystem to
hdfs.

And I have run some analysis on these data with hive, all this run well.

But now I want to avoid the fetch section which use rsync, and write the
logs into hdfs files directly from the servers which generate these logs.

And it seems easy to do this job if the file locate in the hdfs is not
compressed.

But how to write or append logs to an file that is compressed and located
in hdfs?

Is this possible?

Or is this an bad practice?

Thanks!



2012/2/6 <bejoy.hadoop@gmail.com>

> Hi
>     If you have log files enough to become at least one block size in an
> hour. You can go ahead as
> - run a scheduled job every hour that compresses the log files for that
> hour and stores them on to hdfs (can use LZO or even Snappy to compress)
> - if your hive does more frequent analysis on this data store it as
> PARTITIONED BY (Date,Hour) . While loading into hdfs also follow a
> directory - sub dir structure. Once data is in hdfs issue a Alter Table Add
> Partition statement on corresponding hive table.
> -in Hive DDL use the appropriate Input format (Hive has some ApacheLog
> Input Format already)
>
>
> Regards
> Bejoy K S
>
> From handheld, Please excuse typos.
>
> -----Original Message-----
> From: Xiaobin She <xiaobinshe@gmail.com>
> Date: Mon, 6 Feb 2012 16:41:50
> To: <common-user@hadoop.apache.org>; 佘晓彬<xiaobinshe@gmail.com>
> Reply-To: common-user@hadoop.apache.org
> Subject: Re: Can I write to an compressed file which is located in hdfs?
>
> sorry, this sentence is wrong,
>
> I can't compress these logs every hour and them put them into hdfs.
>
> it should be
>
> I can  compress these logs every hour and them put them into hdfs.
>
>
>
>
> 2012/2/6 Xiaobin She <xiaobinshe@gmail.com>
>
> >
> > hi all,
> >
> > I'm testing hadoop and hive, and I want to use them in log analysis.
> >
> > Here I have a question, can I write/append log to  an compressed file
> > which is located in hdfs?
> >
> > Our system generate lots of log files every day, I can't compress these
> > logs every hour and them put them into hdfs.
> >
> > But what if I want to write logs into files that was already in the hdfs
> > and was compressed?
> >
> > Is these files were not compressed, then this job seems easy, but how to
> > write or append logs into an compressed log?
> >
> > Can I do that?
> >
> > Can anyone give me some advices or give me some examples?
> >
> > Thank you very much!
> >
> > xiaobin
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message