hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aleks Laz <al-userhb...@none.at>
Subject Newbie Question about 37TB binary storage on HBase
Date Thu, 27 Nov 2014 21:27:47 GMT
Dear All.

Hi Wilm ;-)

I have started this question on hadoop-user list.


I hope you can help me.

We have since ~2012 collected a lot of binary data (jpg's).
The size per file is ~1-5 MB currently but this could be changed.

There are much more then 41 055 670 Files, the count still
run, in ~680 <ID> dirs with this hierarchy

The Storage hierarchy is like this.


The binary data are in the directory below <DAY> ~1000 Files per
directory and mounted with xfs.

The pictures are more or less volatile.
Means: After saved on the disc there are seldom and never changes on the

Due to the fact that the platform now grows up we need to create a more
scalable setup.

I haven't read to much about HBase, due to the fact that I haven't seen
as an option.

Please accept my apologies for this I will start now to dig deeper into

Due to the fact that on this list more experienced Hadoop, HDFS and 
users are then I, I hope you can answer some basic questions from me.

Our application is a nginx/php-fpm/postgresql Setup.
The target design is nginx + proxy features / php-fpm / $DB / $Storage.

.) Can I mix HDFS /HBase for binary data storage and data analyzing?

.) What is the preferred way to us HBase  with PHP?
.) How difficult is it to use HBase with PHP?

.) What's a good solution for the 37 TB or the upcoming ~120 TB to
    [ ] N Servers with 1 37 TB mountpoints per server?
    [ ] N Servers with x TB mountpoints pers server?
    [ ] other:

.) Is HBase a good value for $Storage?
.) Is HBase a good value for $DB?
     DB-Size is smaller then 1 GB, I would use HBase just for HA features
     of Hadoop.

.) Due to the fact that HBase is a file-system I could use
       /cams , for binary data
       /DB   , for DB storage
       /logs , for log storage
     but is this wise. On the 'disk' they are different RAIDs.

.) Should I plan a dedicated Network+Card for the 'cluster
    communication' as for the most other cluster software?
    From what I have read it looks not necessary but from security point
    of view, yes.

.) Maybe the communication with the componnents (hadoop, zk, ...) could
    be setup ed with TLS?

Thank you very much that you have read the mail up to this line ;-)

Thank you also for feedback which is very welcome and appreciated.

Best Regards

View raw message