hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William Kang <weliam.cl...@gmail.com>
Subject Re: Anyway to load certain Key/Value pair fast?
Date Wed, 13 Feb 2013 05:52:49 GMT
Hi Harsh,
Thanks for moving the post to the correct list.


On Wed, Feb 13, 2013 at 12:29 AM, Harsh J <harsh@cloudera.com> wrote:
> Please do not use the general@ lists for any user-oriented questions.
> Please redirect them to user@hadoop.apache.org lists, which is where
> the user community and questions lie.
> I've moved your post there and have added you on CC in case you
> haven't subscribed there. Please reply back only to the user@
> addresses. The general@ list is for Apache Hadoop project-level
> management and release oriented discussions alone.
> On Wed, Feb 13, 2013 at 10:54 AM, William Kang <weliam.cloud@gmail.com> wrote:
>> Hi All,
>> I am trying to figure out a good solution for such a scenario as following.
>> 1. I have a 2T file (let's call it A), filled by key/value pairs,
>> which is stored in the HDFS with the default 64M block size. In A,
>> each key is less than 1K and each value is about 20M.
>> 2. Occasionally, I will run analysis by using a different type of data
>> (usually less than 10G, and let's call it B) and do look-up table
>> alike operations by using the values in A. B resides in HDFS as well.
>> 3. This analysis would require loading only a small number of values
>> from A (usually less than 1000 of them) into the memory for fast
>> look-up against the data in B. The way B finds the few values in A is
>> by looking up for the key in A.
>> Is there an efficient way to do this?
>> I was thinking if I could identify the locality of the block that
>> contains the few values, I might be able to push the B into the few
>> nodes that contains the few values in A?  Since I only need to do this
>> occasionally, maintaining a distributed database such as HBase cant be
>> justified.
>> Many thanks.
>> Cao
> --
> Harsh J

View raw message