Mailing-List: contact hive-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hive-user@hadoop.apache.org
Received-SPF: neutral (nike.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: <AANLkTinsP30DhNBxR5wOozeZJ4al4i1A0iTX9zQWmFeV@mail.gmail.com>
References: <AANLkTinsP30DhNBxR5wOozeZJ4al4i1A0iTX9zQWmFeV@mail.gmail.com>
Date: Wed, 7 Jul 2010 18:06:56 -0700
Message-ID: <AANLkTimtLN9X_yqEbga9Y8UHKUs7MnJJiXDhZeobHoAW@mail.gmail.com>
Subject: Re: 1 big file or multiple smaller files for loading data from a
	database?
From: Sarah Sproehnle <sarah@cloudera.com>
To: hive-user@hadoop.apache.org
Content-Type: text/plain; charset=ISO-8859-1

Hi Todd,

Are you planning to use Sqoop to do this import?  If not, you should.
:)  It will do a parallel import, using MapReduce, to load the table
into Hadoop.  With the --hive-import option, it will also create the
Hive table definition.

Cheers,
Sarah

On Wed, Jul 7, 2010 at 5:51 PM, Todd Lee <ronnietoddlee@gmail.com> wrote:
> Hi,
> I am new to Hive and Hadoop in general. I have a table in Oracle that has
> millions of rows and I'd like to export it into HDFS so that I can run some
> Hive queries. My first question is, is it recommended to export the entire
> table as a single file (possibly 5GB), or more files with smaller sizes (10
> files each 500mb)? also, does it matter if I put the files under different
> sub-directories before I do the data load in Hive? or everything has to be
> under the same folder?
> Thanks,
> T
> p.s. I am sorry if this post is submitted twice.


-- 
Sarah Sproehnle
Educational Services
Cloudera, Inc
http://www.cloudera.com/training