hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aleks Laz <al-userhad...@none.at>
Subject Re: Newbie Question about 37TB binary storage on HDFS
Date Thu, 27 Nov 2014 19:44:11 GMT
Hi.

Am 27-11-2014 18:16, schrieb Wilm Schumacher:
> Hi,
> 
> I would like to open up another option for you. You could pump the data
> into hbase directly.
> 
> Together with
> https://issues.apache.org/jira/browse/HBASE-11339
> this would be a good fit.

Thank you.

After a quick look to

https://hbase.apache.org/book/architecture.html#arch.overview

this sounds a real option.
Is this in the current version?

Please can you try to answer the questions below for hbase or should I 
subscribe to

https://hbase.apache.org/mail-lists.html => User List

and ask there?

> And I would like to ask a question of the mean size of the images. If 
> it
> is ~10MB (large but normal sized image) and you plan to save 120TB, 
> this
> would be around 12 million images. Is that correct?

The size are ~1-5 MB currently but this could be changed.

To be honest there are much more then 22 488 987 Files, the count still 
run, in ~680 <ID> dirs with this hierarchy

>>                     <YEAR>/<MONTH>/<DAY>
>> <MOUNT_ROOT>/cams/<ID>/2014/11/19/

Is this too much for hdaoop or hbase?
I haven't thout about this aspect.

> Furthermore another question: are the images volatile? Means: Are the
> images often changed by the application?

The pictures are more or less volatile.
Means: After saved on the disc there are seldom and never changes on the 
images.

BR Aleks

> Best,
> 
> Wilm
> 
> Am 27.11.2014 um 17:49 schrieb Aleks Laz:
>> Dear All.
>> 
>> We have since ~2012 collected a lot of binary data (jpg's).
>> 
>> The Storage hierarchy is like this.
>> 
>>                     <YEAR>/<MONTH>/<DAY>
>> <MOUNT_ROOT>/cams/<ID>/2014/11/19/
>> 
>> The binary data are in the directory below <DAY> ~1000 Files per
>> directory and mounted with xfs.
>> 
>> Due to the fact that the platform now grows up we need to create a 
>> more
>> scalable setup.
>> 
>> I have read
>> 
>> http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
>> 
>> http://wiki.apache.org/hadoop/FAQ#HDFS
>> ...
>> and hope that I have understand the main concept behind HDFS.
>> 
>> Due to the fact that on this list more experienced Hadoop and HDFS 
>> users
>> are then I, I hope you can answer some basic questions from me.
>> 
>> Our application is a nginx/php-fpm/postgresql Setup.
>> The target design is nginx + proxy features / php-fpm / $DB / 
>> $Storage.
>> 
>> .) Can I mix HDFS for binary data storage and data analyzing?
>> 
>> .) What is the preferred way to us HDFS with PHP?
>> .) How difficult is it to use HDFS with PHP?
>>    Google have a lot of answers to this question (WebHDFS, NFS, 
>> thrift,
>> ...) but which one is now 'the' solution and still 'supported' by the
>> hadoop community?
>>    Btw.: The link on http://wiki.apache.org/hadoop/HDFS-APIs for PHP 
>> is
>> a 404
>> 
>> 
>> .) What's a good solution for the 37 TB or the upcoming ~120 TB to
>> distribute?
>>   [ ] N Servers with 1 37 TB mountpoints per server?
>>   [ ] N Servers with x TB mountpoints pers server?
>>   [ ] other:
>> 
>> .) Is HDFS a good value for $Storage?
>> .) Is HBase a good value for $DB?
>>    DB-Size is smaller then 1 GB, I would use HBase just for HA 
>> features
>> of Hadoop.
>> 
>> .) Due to the fact that HDFS is a file-system I could use
>>      /cams , for binary data
>>      /DB   , for DB storage
>>      /logs , for log storage
>>    but is this wise. On the 'disk' they are different RAIDs.
>> 
>> .) Should I plan a dedicated Network+Card for the 'cluster
>> communication' as for the most other cluster software?
>>    From what I have read it looks not necessary but from security 
>> point
>> of view, yes.
>> 
>> .) Maybe the communication with the componnents (hadoop, zk, ...) 
>> could
>> be setup ed with TLS?
>> 
>> Thank you very much that you have read the mail up to this line ;-)
>> 
>> Thank you also for feedback which is very welcome and appreciated.
>> 
>> Best Regards
>> Aleks
>> 

Mime
View raw message