hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: is hadoop suitable for us?
Date Thu, 17 May 2012 23:17:22 GMT
The short answer is yes. 
The longer answer is that you will have to account for the latencies.

There is more but you get the idea..

Sent from my iPhone

On May 17, 2012, at 5:33 PM, "Pierre Antoine Du Bois De Naurois" <padbdn@gmail.com>

> We have large amount of text files that we want to process and index (plus
> applying other algorithms).
> The problem is that our configuration is share-everything while hadoop has
> a share-nothing configuration.
> We have 50 VMs and not actual servers, and these share a huge central
> storage. So using HDFS might not be really useful as replication will not
> help, distribution of files have no meaning as all files will be again
> located in the same HDD. I am afraid that I/O will be very slow with or
> without HDFS. So i am wondering if it will really help us to use
> hadoop/hbase/pig etc. to distribute and do several parallel tasks.. or is
> "better" to install something different (which i am not sure what). We
> heard myHadoop is better for such kind of configurations, have any clue
> about it?
> For example we now have a central mySQL to check if we have already
> processed a document and keeping there several metadata. Soon we will have
> to distribute it as there is not enough space in one VM, But Hadoop/HBase
> will be useful? we don't want to do any complex join/sort of the data, we
> just want to do queries to check if already processed a document, and if
> not to add it with several of it's metadata.
> We heard sungrid for example is another way to go but it's commercial. We
> are somewhat lost.. so any help/ideas/suggestions are appreciated.
> Best,
> PA
> 2012/5/17 Abhishek Pratap Singh <manu.infy@gmail.com>
>> Hi,
>> For your question if HADOOP can be used without HDFS, the answer is Yes.
>> Hadoop can be used with any kind of distributed file system.
>> But I m not able to understand the problem statement clearly to advice my
>> point of view.
>> Are you processing text file and saving in distributed database??
>> Regards,
>> Abhishek
>> On Thu, May 17, 2012 at 1:46 PM, Pierre Antoine Du Bois De Naurois <
>> padbdn@gmail.com> wrote:
>>> We want to distribute processing of text files.. processing of large
>>> machine learning tasks, have a distributed database as we have big amount
>>> of data etc.
>>> The problem is that each VM can have up to 2TB of data (limitation of
>> VM),
>>> and we have 20TB of data. So we have to distribute the processing, the
>>> database etc. But all those data will be in a shared huge central file
>>> system.
>>> We heard about myHadoop, but we are not sure why is that any different
>> from
>>> Hadoop.
>>> If we run hadoop/mapreduce without using HDFS? is that an option?
>>> best,
>>> PA
>>> 2012/5/17 Mathias Herberts <mathias.herberts@gmail.com>
>>>> Hadoop does not perform well with shared storage and vms.
>>>> The question should be asked first regarding what you're trying to
>>> achieve,
>>>> not about your infra.
>>>> On May 17, 2012 10:39 PM, "Pierre Antoine Du Bois De Naurois" <
>>>> padbdn@gmail.com> wrote:
>>>>> Hello,
>>>>> We have about 50 VMs and we want to distribute processing across
>> them.
>>>>> However these VMs share a huge data storage system and thus their
>>>> "virtual"
>>>>> HDD are all located in the same computer. Would Hadoop be useful for
>>> such
>>>>> configuration? Could we use hadoop without HDFS? so that we can
>>> retrieve
>>>>> and store everything in the same storage?
>>>>> Thanks,
>>>>> PA

View raw message