hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wilm Schumacher <wilm.schumac...@cawoom.com>
Subject Re: Newbie Question about 37TB binary storage on HDFS
Date Thu, 27 Nov 2014 17:16:05 GMT
Hi,

I would like to open up another option for you. You could pump the data
into hbase directly.

Together with
https://issues.apache.org/jira/browse/HBASE-11339
this would be a good fit.

And I would like to ask a question of the mean size of the images. If it
is ~10MB (large but normal sized image) and you plan to save 120TB, this
would be around 12 million images. Is that correct?

Furthermore another question: are the images volatile? Means: Are the
images often changed by the application?

Best,

Wilm

Am 27.11.2014 um 17:49 schrieb Aleks Laz:
> Dear All.
> 
> We have since ~2012 collected a lot of binary data (jpg's).
> 
> The Storage hierarchy is like this.
> 
>                     <YEAR>/<MONTH>/<DAY>
> <MOUNT_ROOT>/cams/<ID>/2014/11/19/
> 
> The binary data are in the directory below <DAY> ~1000 Files per
> directory and mounted with xfs.
> 
> Due to the fact that the platform now grows up we need to create a more
> scalable setup.
> 
> I have read
> 
> http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
> 
> http://wiki.apache.org/hadoop/FAQ#HDFS
> ...
> and hope that I have understand the main concept behind HDFS.
> 
> Due to the fact that on this list more experienced Hadoop and HDFS users
> are then I, I hope you can answer some basic questions from me.
> 
> Our application is a nginx/php-fpm/postgresql Setup.
> The target design is nginx + proxy features / php-fpm / $DB / $Storage.
> 
> .) Can I mix HDFS for binary data storage and data analyzing?
> 
> .) What is the preferred way to us HDFS with PHP?
> .) How difficult is it to use HDFS with PHP?
>    Google have a lot of answers to this question (WebHDFS, NFS, thrift,
> ...) but which one is now 'the' solution and still 'supported' by the
> hadoop community?
>    Btw.: The link on http://wiki.apache.org/hadoop/HDFS-APIs for PHP is
> a 404
> 
> 
> .) What's a good solution for the 37 TB or the upcoming ~120 TB to
> distribute?
>   [ ] N Servers with 1 37 TB mountpoints per server?
>   [ ] N Servers with x TB mountpoints pers server?
>   [ ] other:
> 
> .) Is HDFS a good value for $Storage?
> .) Is HBase a good value for $DB?
>    DB-Size is smaller then 1 GB, I would use HBase just for HA features
> of Hadoop.
> 
> .) Due to the fact that HDFS is a file-system I could use
>      /cams , for binary data
>      /DB   , for DB storage
>      /logs , for log storage
>    but is this wise. On the 'disk' they are different RAIDs.
> 
> .) Should I plan a dedicated Network+Card for the 'cluster
> communication' as for the most other cluster software?
>    From what I have read it looks not necessary but from security point
> of view, yes.
> 
> .) Maybe the communication with the componnents (hadoop, zk, ...) could
> be setup ed with TLS?
> 
> Thank you very much that you have read the mail up to this line ;-)
> 
> Thank you also for feedback which is very welcome and appreciated.
> 
> Best Regards
> Aleks
> 

Mime
View raw message