hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sarah Sproehnle <sa...@cloudera.com>
Subject Re: 1 big file or multiple smaller files for loading data from a database?
Date Thu, 08 Jul 2010 01:06:56 GMT
Hi Todd,

Are you planning to use Sqoop to do this import?  If not, you should.
:)  It will do a parallel import, using MapReduce, to load the table
into Hadoop.  With the --hive-import option, it will also create the
Hive table definition.

Cheers,
Sarah

On Wed, Jul 7, 2010 at 5:51 PM, Todd Lee <ronnietoddlee@gmail.com> wrote:
> Hi,
> I am new to Hive and Hadoop in general. I have a table in Oracle that has
> millions of rows and I'd like to export it into HDFS so that I can run some
> Hive queries. My first question is, is it recommended to export the entire
> table as a single file (possibly 5GB), or more files with smaller sizes (10
> files each 500mb)? also, does it matter if I put the files under different
> sub-directories before I do the data load in Hive? or everything has to be
> under the same folder?
> Thanks,
> T
> p.s. I am sorry if this post is submitted twice.



-- 
Sarah Sproehnle
Educational Services
Cloudera, Inc
http://www.cloudera.com/training

Mime
View raw message