hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rohit Kelkar <rohitkel...@gmail.com>
Subject Re: advice needed on storing large objects on hdfs
Date Mon, 30 Jan 2012 04:36:05 GMT
Hi Loan, this seems interesting. But in your approach I have a follow
up question -  would I be able to take advantage of data locality
while running map-reduce tasks? My understanding is that the locality
would be with respect to the references to those objects and not the
actual objects themselves.

- Rohit Kelkar

On Fri, Jan 27, 2012 at 4:21 PM, Ioan Eugen Stan <stan.ieugen@gmail.com> wrote:
> Hello Rohit,
>
> I would try to write most objects in a Hadoop Sequence file or a MapFile and
> store the index/byte offeset in HBase.
>
> When reading: open the file seek() to the position and start reading the
> key:value. I don't think that using toByteArray() is good because, I think,
> you are creating a copy of the object in memory. If it's big you will end up
> with two instances of them. Try to stream the object directly to disk.
>
> I don't know if 5mb is good or not, I hope someone can shed some light.
>
> If the objects are changing: append to the SequenceFile and update the
> reference in HBase. From time to time run a MR job that cleans the file.
>
> You can use ZooKeeper to coordinate writing to many Sequence Files.
>
> If you go this way, please post your results.
>
> Cheers,
>
> Pe 27.01.2012 10:42, Rohit Kelkar a scris:
>
>> Hi,
>> I am using hbase to store java objects. The objects implement the
>> Writable interface. The size of objects to be stored in each row
>> ranges from a few kb to ~50 Mb. The strategy that I am planning to use
>> is
>> if object size<  5Mb
>> store it in hbase
>> else
>> store it on hdfs and insert its hdfs location in hbase
>>
>> While storing the objects I am using
>> WritableUtils.toByteArray(myObject) method. Can I use the
>> WritableUtils.toByteArray(myObject).length to determine if the object
>> should go in hbase or hdfs? Is this an acceptable strategy? Is the 5
>> MB limit a safe enough threshold?
>>
>> - Rohit Kelkar
>
>
>
> --
> Ioan Eugen Stan
> http://ieugen.blogspot.com

Mime
View raw message