Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D01C9DC4C for ; Tue, 31 Jul 2012 05:35:02 +0000 (UTC) Received: (qmail 22881 invoked by uid 500); 31 Jul 2012 05:35:01 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 22700 invoked by uid 500); 31 Jul 2012 05:35:01 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 22681 invoked by uid 99); 31 Jul 2012 05:35:00 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 31 Jul 2012 05:35:00 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of vinod@vinodsingh.com designates 209.85.214.176 as permitted sender) Received: from [209.85.214.176] (HELO mail-ob0-f176.google.com) (209.85.214.176) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 31 Jul 2012 05:34:54 +0000 Received: by obbtb18 with SMTP id tb18so12896197obb.35 for ; Mon, 30 Jul 2012 22:34:33 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:x-gm-message-state; bh=NajdRA6g6brW58faKDrw6kFZcHI3DMJPmdteiot9fig=; b=HJnCI9yjhiXuhT0Xz6775LQtIfv3PkRcXXSRxfB9XtfUo8CFSIZhxzxcnY/HBELn/b zly1N1EgLmUh9d4goga92rlEz9N5+e+LZudf765ySdIe4RtQ5MztbnM2cW6rE+MgtxBY NX2da+bTlfOMFd57atZH9BoByXSukJ8x/CnhQJ8QeBurn+s4eAQ90EIkcqrpaAbmB0eH Cbxrt8iWMAkh6OZ566rNjfRSeeiNUSGQxJRgRv5KjJn6FS3yzdjGxTftOaTOvE9wTjfc AsjRdngxybOOYbHlH5sxJ9UKCl7U1yLGlpyVkWfZZHjH6rqRXdKlzo9Qfs5LfvZbtVmj 6ALg== Received: by 10.60.2.74 with SMTP id 10mr20851550oes.64.1343712873207; Mon, 30 Jul 2012 22:34:33 -0700 (PDT) MIME-Version: 1.0 Received: by 10.76.126.178 with HTTP; Mon, 30 Jul 2012 22:34:13 -0700 (PDT) In-Reply-To: References: From: Vinod Singh Date: Tue, 31 Jul 2012 11:04:13 +0530 Message-ID: Subject: Re: Find the files which contains a particular String To: user@hive.apache.org Content-Type: multipart/alternative; boundary=e89a8fb203407cd75a04c6198862 X-Gm-Message-State: ALoCoQm5DGEKQY6iu0pV7jtLSP1R9rGKjBr7NevCLloosVLeOinUDjp4jcti+0EtcrPWBpfw9i/4 X-Virus-Checked: Checked by ClamAV on apache.org --e89a8fb203407cd75a04c6198862 Content-Type: text/plain; charset=UTF-8 I believe Hive does not have any feature, which can provide this information. You may like to write a custom Map / Reduce program and get the file name being processed as shown below- ((FileSplit) context.getInputSplit()).getPath() and then emit the file name when an occurrence of the word is found. Thanks, Vinod On Tue, Jul 31, 2012 at 9:41 AM, Techy Teck wrote: > I have around 100 files and each file is of the size of 1GB. And I need to > find a String in all these 100 files and also which files contains that > particular String. I am working with Hadoop File System and all those 100 > files are in Hadoop File System. > > All the 100 files are under real folder, so If I do like this below, I > will be getting all the 100 files. And I need to find which files contains > a particular String *hello* under real folder. > > bash-3.00$ hadoop fs -ls /technology/dps/real > > > > > And this is my data structure in hdfs- > > row format delimited > fields terminated by '\29' > collection items terminated by ',' > map keys terminated by ':' > stored as textfile > > > > How I can write MapReduce jobs to do this particular problem so that I can > find which files contains a particular string? Any simple example will be > of great help to me. --e89a8fb203407cd75a04c6198862 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
I believe Hive does not have any feature, which can provide this infor= mation. You may like to write a custom Map / Reduce program and get the fil= e name being processed as shown below-

((FileSplit) context.getInputSplit()).getPath()=

and then emit the file name when an occurrence of the w= ord is found.

Thanks,
Vinod

On Tue, Jul 31, 2012 at 9:41 AM, Techy Teck = <comptechge= eky@gmail.com> wrote:
I have around 100 files and each file is of the size of 1GB. And = I need to find a String in all these 100 files and also which files contain= s that particular String. I am working with Hadoop File System and all thos= e 100 files are in Hadoop File System.

= All the 100 files are= under real folder, so If I do like this below, I will be getting all the 1= 00 files. And I need to find which files contains a particular String he= llo under real folder.

=
bash-3.00$ hadoop fs -ls /technology/dps/real



= And this is my data s= tructure in hdfs-

=
row format delimited=C2=A0
fields termi= nated by '\29'
collection items= terminated by ','
map keys ter= minated by ':'
stored as textfile


How I can write MapRe= duce jobs to do this particular problem so that I can find which files cont= ains a particular string? Any simple example will be of great help to me.

--e89a8fb203407cd75a04c6198862--