hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhishek Mehta <abhis...@tresata.com>
Subject Re: Hadoop Java Versions
Date Fri, 01 Jul 2011 18:53:44 GMT
i definitely agree with scott.  as a user of the hadoop open source stack for building our
banking focused big data analytics applications, i speak on behalf of our clients and the
emerging hadoop eco-system that open and honest conversations on this thread/group, irrespective
of whether one represents a company or apache, should be encouraged.

as an instance, with the fact that cloudera, mapR and soon hortonworks are all going to be
offering competing hadoop distros for enterprises, it is important for all of us (and prospective
users) to understand what they are doing to address critical gaps on the platform, and how
the hadoop ecosystem benefits from it.  

From our perspective, it doesn't matter if one is better than the other (which is not the
point i saw ted or mc making), but that companies, startups, apache and everybody else:

1.  is thinking of the right issues
2.  willing to solve them (and ideally contributing the solutions back) and
3.  informing the exploding hadoop userbase of what not to do

I see it benefitting all of us, especially as Hadoop rapidly jumps the transom and becomes
the platform of choice for data management in industries like banking, retail and healthcare...just
as it has for social media and the web...

isn't that what we are launching our business plans around anyway...

And in that sense we all owe ASF and the hadoop community (and not any one company) an equal
amount of gratitude, humility and respect.  


On Jul 1, 2011, at 1:22 PM, Scott Carey wrote:

> Although this thread is wandering a bit, I disagree strongly that it is
> inappropriate to discuss other vendor specific features (or competing
> compute platform features) on general@.  The topic has become the factors
> that influence hardware purchase choices, and one of those is how the
> system deals with disk failure.  Compare/contrast with other platforms is
> healthy for the Hadoop project! +1
> 
> On 6/30/11 9:47 PM, "Ian Holsman" <hadoop@holsman.net> wrote:
> 
>> 
>> On Jul 1, 2011, at 2:08 PM, M. C. Srivas wrote:
>> 
>>> On Thu, Jun 30, 2011 at 5:24 PM, Todd Lipcon <todd@cloudera.com> wrote:
>>> 
>>>> 
>>>> I'd advise you to look at "stock hadoop" again. This used to be true,
>>>> but
>>>> was fixed a long while back by HDFS-457 and several followup JIRAs.
>>>> 
>>>> If MapR does something fancier, I'm sure we'd be interested to hear
>>>> about
>>>> it
>>>> so we can compare the approaches.
>>>> 
>>>> -Todd
>>>> 
>>>> 
>>> MapR tracks disk responsiveness. In other words, a moving histogram of
>>> IO-completion times is maintained internally, and if a disk starts
>>> getting
>>> really slow, it is pre-emptively taken offline so it does not create
>>> long
>>> tails for running jobs (and the data on the disk is re-replicated using
>>> whatever re-replication policy is in place).  One of the benefits of
>>> managing the disks directly instead of through ext3 / xfs / or other ...
>>> 
>>> All these stats can be fed into Ganglia (or pushed out centrally via a
>>> text
>>> file that can be pulled out using NFS)  if historical info about disk
>>> behavior (and failures) needs to be preserved.
>>> 
>>> - Srivas.
>> 
>> While I am intrigued about how MapR performs internally, I don't think
>> this is the forum for it.
>> please keep MapR (and other vendor specific discussions) on their
>> respective support forums.
>> 
>> Thanks!
>> 
>> Ian.
>> 
> 


Mime
View raw message