hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@maprtech.com>
Subject Re: Hadoop for unstructured data storage
Date Thu, 06 Oct 2011 22:50:01 GMT
HDFS does not really meet your needs.  I think that MapR's solution would.
 I will contact off-line to give details.

On Thu, Oct 6, 2011 at 3:35 PM, Hemant kulkarni <kulkarnihemant@gmail.com>wrote:

> Hi all,
> We are a small software development firm working on data backup
> software. We have a backup product which copies data from client
> machine to data store. Currently we provide a specialized hardware to
> store data(1-3TB disks and servers). We want to provide solution to
> some customers(mining company) with following requirements
> 1] Huge data storage capacity(initially starting with 100 TB but
> should be easy to increase)
> 2] Initially this facility is used as data storage but in future
> company plans to add data processing software(some MapReduce jobs)
> 3] Most of data is unstructured (mostly images, text files and videos)
> 4] many times data is duplicate of some original. So need de duplication
> 5] Mostly data is added every time(daily backup) and occasionally
> read.(Write every day new data and read on weekly)
> 6] data copied is in terms of files(every backup is 100,000 files each
> file is some MB and some files in KB)
> 7] this is data storage so latency requirements are not very strict
> 8] Some part of data have very high HA requirements. Should be copied
> to data centers outside country on timely basis(weekly, but data size
> is small like few TB)
> 9]Currently we provide some sort of HSM(Hierarchical Storage
> Management ). company needs something similar in new solution
> 10] Single namespace and versioning of files is another requirement
> As I understood HDFS doesn't suit directly for such storage due to
> following design consideration
> 1] Large no of small files
> 2] duplicate data
> 3] write many read once requirement
> Here are my questions
> 1] Does DHFS support our client requirements? or at least can it be
> configured to suit needs?
> 2] is there any customization of HDFS(if possible) which will serve the
> purpose
> is there any other solution which will work?
> All thoughts/suggestions are welcome
> Regards,
> Hemant.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message