hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fei Pan <cnwe...@gmail.com>
Subject Re: hadoop/hive data loading
Date Thu, 12 May 2011 09:12:21 GMT
hi,hadoopman

you can put the large data into your hdfs using "hadoop fs -put src dest"
and then you can use "alter table xxx add partition(xxxxx) location 'desc'"



2011/5/11 amit jaiswal <amit_jus@yahoo.com>

> Hi,
>
> What is the meaning of 'union' over here. Is there any hadoop job with 1
> (or few) reducer that combines all data together. Have you tried external
> (dynamic) partitions for combining data?
>
> -amit
>
>
> ----- Original Message -----
> From: hadoopman <hadoopman@gmail.com>
> To: common-user@hadoop.apache.org
> Cc:
> Sent: Tuesday, 10 May 2011 11:26 PM
> Subject: hadoop/hive data loading
>
> When we load data into hive sometimes we've run into situations where the
> load fails and the logs show a heap out of memory error.  If I load just a
> few days (or months) of data then no problem.  But then if I try to load two
> years (for example) of data then I've seen it fail.  Not with every feed but
> certain ones.
>
> Sometimes I've been able to split the data and get it to load.  An example
> of one type of feed I'm working on is the apache web server access logs.
> Generally it works.  But there are times when I need to load more than a few
> months of data and get the memory heap errors in the task logs.
>
> Generally how do people load their data into Hive?  We have a process where
> we first copy it to hdfs then from there we run a staging process to get it
> into hive.  Once that completes we perform a union all then overwrite table
> partition.  Usually it's during the union all stage that we see these errors
> appear.
>
> Also is there a log which tells you which log it fails on?  I can see which
> task/job failed but not finding which file it's complaining about.  I figure
> that might help a bit..
>
> Thanks!
>
>


-- 
Stay Hungry. Stay Foolish.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message