hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Dunning" <tdunn...@veoh.com>
Subject RE: Query against different data types within HDFS using Map/Reduce
Date Mon, 05 May 2008 16:00:35 GMT

You just have to write an adapted input format that reads multiple kinds of input.

It can key off the contents of the file or the name.  Depending on names is bad, but has a
long lineage so people tend to deal with it reasonably well.

It isn't very hard to write.

-----Original Message-----
From: Kayla Jay [mailto:kaylais30@yahoo.com]
Sent: Mon 5/5/2008 6:18 AM
To: core-user@hadoop.apache.org
Subject: Query against different data types within HDFS using Map/Reduce
Has anyone come across this scenario and if not, does anyone have any suggestions?

What if you store different types of data within HDFS.  You store XML, text, binary, sequence
files, etc.  You now want to run a query against ALL of the data stored within HDFS via a
map/reduce job.  How do you do this if the data input is different types?
For example, (simplest), you want to find all the terms/words matching a pattern and count
and return where they are within each data source.  Even the example of word count could be
an example but given that not all data is textual line-by-line.  The terms/words could be
contained within XML or against a sequence file or some other format that is stored in your
HDFS.  What if you want to find those terms/words against ALL data sets that may not be same
format stored within HDFS.

I understand that your Map/Reduce jobs specify a specific input format upfront, however, if
you have different data formats within HDFS and you want to run the exact query against all
formats within 1 map/reduce job, how is this even possible?

Can you even run a single query in a single map/reduce job against all the data across HDFS
that is in different formats?
If not, any suggestions on how to handle this?  


Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message