hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wilm Schumacher <wilm.schumac...@cawoom.com>
Subject Re: Newbie Question about 37TB binary storage on HDFS
Date Thu, 27 Nov 2014 20:05:14 GMT


Am 27.11.2014 um 20:44 schrieb Aleks Laz:
> After a quick look to
> 
> https://hbase.apache.org/book/architecture.html#arch.overview
> 
> this sounds a real option.
> Is this in the current version?
nope. You have to compile it into your hbase-version. But I didn't do
any performance tests. You should asks the experts.

> Please can you try to answer the questions below for hbase or should I
> subscribe to
> 
> https://hbase.apache.org/mail-lists.html => User List
> 
> and ask there?
you should ask there, so we do not bother the other guys here who are
not interested ;).

> Is this too much for hdaoop or hbase?
well, depends. Not for hbase, but possibly for hadoop. Hadoop is for
streaming LARGE data, but only a "few" files. Because of the design of
hdfs, to be more specific the namenode, there is the so called "small
files problem".

http://blog.cloudera.com/blog/2009/02/the-small-files-problem/

So you are bound to about 20-100 million files, what you will reach.
However, hadoop is able to use "container files", e.g. sequence files,
or better map files. So if your data isn't changing very often you could
put a day or a month of images into one of these container files and end
up with hundreds of files, which will work quite well. But I think you
could encounter latency issues if you want fast fetches of an image.

> I haven't thout about this aspect.
> 
>> Furthermore another question: are the images volatile? Means: Are the
>> images often changed by the application?
> 
> The pictures are more or less volatile.
> Means: After saved on the disc there are seldom and never changes on the
> images.
okay, so the map file plan could work if you somehow do not want to use
hbase but hdfs directly

As your data is written only once, but read often, hbase seems to be the
perfect fit to your needs (despite cassandra or something else). See you
at the hbase-list ;)

Best,

Wilm

Mime
View raw message