hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William Kang <weliam.cl...@gmail.com>
Subject Anyway to load certain Key/Value pair fast?
Date Wed, 13 Feb 2013 05:24:27 GMT
Hi All,
I am trying to figure out a good solution for such a scenario as following.

1. I have a 2T file (let's call it A), filled by key/value pairs,
which is stored in the HDFS with the default 64M block size. In A,
each key is less than 1K and each value is about 20M.

2. Occasionally, I will run analysis by using a different type of data
(usually less than 10G, and let's call it B) and do look-up table
alike operations by using the values in A. B resides in HDFS as well.

3. This analysis would require loading only a small number of values
from A (usually less than 1000 of them) into the memory for fast
look-up against the data in B. The way B finds the few values in A is
by looking up for the key in A.

Is there an efficient way to do this?

I was thinking if I could identify the locality of the block that
contains the few values, I might be able to push the B into the few
nodes that contains the few values in A?  Since I only need to do this
occasionally, maintaining a distributed database such as HBase cant be
justified.

Many thanks.


Cao

Mime
View raw message