hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chandraprakash Bhagtani <cpbhagt...@gmail.com>
Subject Re: Storing contents of a file in a java object
Date Tue, 06 Oct 2009 05:52:46 GMT
Hi Andrzej,

Let me tell you my scenario, I had 160GB of input data to process based on
some business logic, which was using
1.2GB of metadata in the form of in-memory hash maps. Now my problem was
that, hadoop doesn't provide any shared
storage among parallel tasks, so I had to keep this 1.2GB of metadata
in-memory in every running task. This led me run
less number of parallel tasks.

As a solution to this problem I first used Memcached and then Tokyocabinet
to hold metadata only. The input and output data
was stored in HDFS. While processing, any task (map/reduce) could fetch
metadata from Tokyocabinet. I was not running Tokyo Tyrant. I used TC java
api.

On Mon, Oct 5, 2009 at 7:44 PM, Andrzej Jan Taramina <andrzej@chaeron.com>wrote:

> Chandraprakash:
>
> Thanks for the info!  Great stuff, but it's lead to a few more questions...
>
> > I had run a big mapred job (160GB data) on a small cluster of 7 nodes. I
> > had started 15 Memcached server instances on 7 nodes and I noticed that
> > a single memcached server was processing 1 million requests per second
> > (in my case), however it was definitely 3-4 times slower than in-memory
> > approach. I had to increase the limit of open file descriptors for that.
>
> Was the input for the mapred job coming from Tokyo Cabinet, or were you
> just writing the results of the mapred to TC?
>
> If you were using TC for input to mapred, how did you do the Input Splits?
> Did you write a custom splitter for Tokyo
> Cabinet?
>
> > Since memcached was not performing up to my expectations I used
> > Tokyocabinet ( A file based database) and its performance was near to
> > in-memory approach.
>
> Were you using Tokyo Cabinet over the network, that is, using Tokyo Tyrant?
> Or were you running and accessing a local TC
> process?
>
> Thanks for shedding some light on these additional questions...
>
> --
> Andrzej Taramina
> Chaeron Corporation: Enterprise System Solutions
> http://www.chaeron.com
>



-- 
Thanks & Regards,
Chandra Prakash Bhagtani,

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message