hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xueling Shu <x...@systemsbiology.org>
Subject Which Hadoop product is more appropriate for a quick query on a large data set?
Date Sat, 12 Dec 2009 03:19:14 GMT
 Hi there:

I am researching Hadoop to see which of its products suits our need for
quick queries against large data sets (billions of records per set)

The queries will be performed against chip sequencing data. Each record is
one line in a file. To be clear below shows a sample record in the data set.

one line (record) looks like: 1-1-174-418 TGTGTCCCTTTGTAATGAATCACTATC U2 0 0
1 4 *103570835* F .. 23G 24C

The highlighted field is called "position of match" and the query we are
interested in is the # of sequences in a certain range of this "position of
match". For instance the range can be "position of match" > 200 and
"position of match" + 36 < 200,000.

Any suggestions on the Hadoop product I should start with to accomplish the
task? HBase,Pig,Hive, or ...?



  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message