Return-Path: Delivered-To: apmail-hadoop-hive-user-archive@minotaur.apache.org Received: (qmail 80775 invoked from network); 26 May 2010 18:17:11 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 26 May 2010 18:17:11 -0000 Received: (qmail 80658 invoked by uid 500); 26 May 2010 18:17:11 -0000 Delivered-To: apmail-hadoop-hive-user-archive@hadoop.apache.org Received: (qmail 80627 invoked by uid 500); 26 May 2010 18:17:11 -0000 Mailing-List: contact hive-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hive-user@hadoop.apache.org Delivered-To: mailing list hive-user@hadoop.apache.org Received: (qmail 80619 invoked by uid 99); 26 May 2010 18:17:11 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 May 2010 18:17:10 +0000 X-ASF-Spam-Status: No, hits=-1.0 required=10.0 tests=AWL,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jsichi@facebook.com designates 69.63.179.25 as permitted sender) Received: from [69.63.179.25] (HELO mailout-sf2p.facebook.com) (69.63.179.25) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 May 2010 18:17:05 +0000 Received: from mail.thefacebook.com ([192.168.18.104]) by pp02.snc1.tfbnw.net (8.14.3/8.14.3) with ESMTP id o4QIG6Qw028631 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NOT) for ; Wed, 26 May 2010 11:16:07 -0700 Received: from sc-hub06.TheFacebook.com (192.168.18.83) by sc-hub01.TheFacebook.com (192.168.18.104) with Microsoft SMTP Server (TLS) id 8.2.213.0; Wed, 26 May 2010 11:16:45 -0700 Received: from SC-MBXC1.TheFacebook.com ([192.168.18.102]) by sc-hub06.TheFacebook.com ([192.168.18.83]) with mapi; Wed, 26 May 2010 11:16:45 -0700 From: John Sichi To: "hive-user@hadoop.apache.org" Date: Wed, 26 May 2010 11:14:26 -0700 Subject: RE: Query HDFS files without using LOAD (move) Thread-Topic: Query HDFS files without using LOAD (move) Thread-Index: Acr8/oF0fCXNaJqdQVm73jkwwpCuIgAAMTpR Message-ID: <3120E6F5005EE7419C125CE166D55E90068F00E768@SC-MBXC1.TheFacebook.com> References: <861791.16595.qm@web53404.mail.re2.yahoo.com> <68B7689C98024D43B4C2709456F0B5200A612F046F@SC-MBXC1.TheFacebook.com>,<506521.44553.qm@web53407.mail.re2.yahoo.com> In-Reply-To: <506521.44553.qm@web53407.mail.re2.yahoo.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=1.12.8161:2.4.5,1.2.40,4.0.166 definitions=2010-05-26_04:2010-02-06,2010-05-26,2010-05-26 signatures=0 Use a Hadoop version which includes this: https://issues.apache.org/jira/browse/MAPREDUCE-1501 and set mapred.input.dir.recursive=3Dtrue;=20 We are currently using this in production. However, it does not deal with = the pattern case. JVS ________________________________________ From: Karthik [karthik_swa@yahoo.com] Sent: Wednesday, May 26, 2010 11:08 AM To: hive-user@hadoop.apache.org Subject: Re: Query HDFS files without using LOAD (move) Thanks a lot for the quick reply Ashish. The files are currently across multiple folders as they high in number and = so they are arranged by category (functionally) across multiple folders in = HDFS. Any work around to support multiple folders? -KK. ----- Original Message ---- From: Ashish Thusoo To: "hive-user@hadoop.apache.org" Sent: Wed, May 26, 2010 11:03:43 AM Subject: RE: Query HDFS files without using LOAD (move) You could probably use external tables?? CREATE EXTERNAL TABLE allows you t= o create tables on hdfs files but I do not think that it takes file pattern= s / regex. If all the files are created within a directory then you could p= oint the external table to the directory location and then querying on that= table would automatically query all the files in that directory. Are your = files in a single directory or are they spread out? http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL#Create_Table Ashish -----Original Message----- From: Karthik [mailto:karthik_swa@yahoo.com] Sent: Wednesday, May 26, 2010 10:45 AM To: hive-user@hadoop.apache.org Subject: Query HDFS files without using LOAD (move) Is there a way where I can specify a list of files (or file pattern / regex= ) from a HDFS location other than the Hive Warehouse as a parameter to a Hi= ve Query? I have a bunch of files that are used by other applications as w= ell and I need to perform queries on those as well using Hive and so I do n= ot want to use LOAD and move those files on to Hive warehouse from the orig= inal location. My query is on incremental data (new files) that are added on a daily basis= and need not use the full list of files on a folder and so I need to speci= fy a list of file / pattern, something like a filter of files to the query. Please suggest. - KK.