hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinod Singh <vi...@vinodsingh.com>
Subject Re: Find the files which contains a particular String
Date Tue, 31 Jul 2012 05:34:13 GMT
I believe Hive does not have any feature, which can provide this
information. You may like to write a custom Map / Reduce program and get
the file name being processed as shown below-

((FileSplit) context.getInputSplit()).getPath()

and then emit the file name when an occurrence of the word is found.

Thanks,
Vinod

On Tue, Jul 31, 2012 at 9:41 AM, Techy Teck <comptechgeeky@gmail.com> wrote:

> I have around 100 files and each file is of the size of 1GB. And I need to
> find a String in all these 100 files and also which files contains that
> particular String. I am working with Hadoop File System and all those 100
> files are in Hadoop File System.
>
> All the 100 files are under real folder, so If I do like this below, I
> will be getting all the 100 files. And I need to find which files contains
> a particular String *hello* under real folder.
>
> bash-3.00$ hadoop fs -ls /technology/dps/real
>
>
>
>
> And this is my data structure in hdfs-
>
> row format delimited
> fields terminated by '\29'
> collection items terminated by ','
> map keys terminated by ':'
> stored as textfile
>
>
>
> How I can write MapReduce jobs to do this particular problem so that I can
> find which files contains a particular string? Any simple example will be
> of great help to me.

Mime
View raw message