hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Sichi <jsi...@facebook.com>
Subject RE: Query HDFS files without using LOAD (move)
Date Wed, 26 May 2010 18:14:26 GMT
Use a Hadoop version which includes this:

https://issues.apache.org/jira/browse/MAPREDUCE-1501

and

set mapred.input.dir.recursive=true; 

We are currently using this in production.  However, it does not deal with the pattern case.

JVS

________________________________________
From: Karthik [karthik_swa@yahoo.com]
Sent: Wednesday, May 26, 2010 11:08 AM
To: hive-user@hadoop.apache.org
Subject: Re: Query HDFS files without using LOAD (move)

Thanks a lot for the quick reply Ashish.

The files are currently across multiple folders as they high in number and so they are arranged
by category (functionally) across multiple folders in HDFS.  Any work around to support multiple
folders?

-KK.



----- Original Message ----
From: Ashish Thusoo <athusoo@facebook.com>
To: "hive-user@hadoop.apache.org" <hive-user@hadoop.apache.org>
Sent: Wed, May 26, 2010 11:03:43 AM
Subject: RE: Query HDFS files without using LOAD (move)

You could probably use external tables?? CREATE EXTERNAL TABLE allows you to create tables
on hdfs files but I do not think that it takes file patterns / regex. If all the files are
created within a directory then you could point the external table to the directory location
and then querying on that table would automatically query all the files in that directory.
Are your files in a single directory or are they spread out?

http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL#Create_Table

Ashish

-----Original Message-----
From: Karthik [mailto:karthik_swa@yahoo.com]
Sent: Wednesday, May 26, 2010 10:45 AM
To: hive-user@hadoop.apache.org
Subject: Query HDFS files without using LOAD (move)

Is there a way where I can specify a list of files (or file pattern / regex) from a HDFS location
other than the Hive Warehouse as a parameter to a Hive Query?  I have a bunch of files that
are used by other applications as well and I need to perform queries on those as well using
Hive and so I do not want to use LOAD and move those files on to Hive warehouse from the original
location.

My query is on incremental data (new files) that are added on a daily basis and need not use
the full list of files on a folder and so I need to specify a list of file / pattern, something
like a filter of files to the query.

Please suggest.

- KK.

Mime
View raw message