hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Роман Павленко <pavlenko.roman....@gmail.com>
Subject Re: Can I merge files after I loaded them into hive?
Date Thu, 15 Nov 2012 10:20:05 GMT
Example:
insert overwrite table my_table PARTITION (year=2012,month=9,day=4) select
`data`, `timestamp`, `hour`, `minute`, `second`  from my_table WHERE
year=2012 AND month=9 AND day=4;




2012/11/15 Bejoy KS <bejoy_ks@yahoo.com>

> Hi Chen
>
> You can do it in hive as well. Enable hive merge and Insert OverWrite the
> Partition once agin with Select *.
>
> Hive.merge.mapfiles=true.
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
>
> -----Original Message-----
> From: "Bejoy KS" <bejoy_ks@yahoo.com>
> Date: Thu, 15 Nov 2012 08:10:12
> To: <user@hive.apache.org>
> Reply-To: user@hive.apache.org
> Subject: Re: Can I merge files after I loaded them into hive?
>
> Hi chen
>
> You can use Flume for ingestion into hdfs . Flume takes care of the file
> sizes, combines the files and stores as one large file. This is a better
> approach.
>
> You can have custom MR jobs to merge these files in hdfs as well. Use
> combineFileInputFormat and start a map only job with Identity mapper with
> split size set to the required large file size.
>
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
>
> -----Original Message-----
> From: Cheng Su <scarcer.cn@gmail.com>
> Date: Thu, 15 Nov 2012 16:03:44
> To: <user@hive.apache.org>
> Reply-To: user@hive.apache.org
> Subject: Can I merge files after I loaded them into hive?
>
> Hi, all.
>
> Can I merge files after I loaded them into hive?
> This is my situation:
>
> There is a log table partitioned by date, which is store the nginx access
> logs.
> The raw log files are loaded into hive every hour.
> By now, a single log file size is small, say 10 MB or even smaller.
> So there are 24 small size files in one partition.
> This is ineffective in my opinion, and will consume more hadoop heap size.
> That's why I want to merge the small files.
>
> Can hive merge those files automatically?
> Or dose hive provide some tools to merge files?
> Or I can just use hadoop dfs -cat to do that?
>
> --
>
> Regards,
> Cheng Su
>

Mime
View raw message