hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohan Radhakrishnan <radhakrishnan.mo...@gmail.com>
Subject Re: Hadoop and Open Data (CKAN.org).
Date Thu, 04 Sep 2014 13:44:13 GMT
I understand that coding MR jobs using a language is required but if we are
just processing large amounts of data (Machine Learning for example) we
could use Pig. I recently processed 0.25 TB on AWS clusters in a reasonably
short time. In this case the development effort is very less.


Thanks,
Mohan


On Thu, Sep 4, 2014 at 6:41 PM, Alec Ten Harmsel <alec@alectenharmsel.com>
wrote:

>  I would recommend using Hadoop only if you are ingesting a lot of data
> and you need reasonable performance at scale. I would recommend starting
> with using <insert language/tool of choice> to ingest and transform data
> until that process starts taking too long.
>
> For example, one of our researchers at the University of Michigan had to
> process ~150GB of data. Using python, processing that data took about 45
> minutes - it was not worth it to spend extra development time to run it on
> Hadoop. This time will change depending on what you need to do and the
> hardware available, naturally.
>
> So until you need to frequently process large amounts of data, I'd stick
> with something you're already familiar with.
>
> Alec Ten Harmsel
>
> On 09/04/2014 03:30 AM, Henrik Aagaard Jørgensen wrote:
>
>  Dear all,
>
>
>
> I’m very new to Hadoop as I’m still trying to grasp its value and
> purpose. I do hope my question on this mailing list is OK.
>
>
>
> I manage our open data platform at our municipality, using CKAN.org. It
> works very well for its purpose of showing data and adding API’s to data.
>
>
>
> However, I’m very interested in knowing more about Hadoop and if it would
> fit into a (open) data platform, as we are getting more and more data to
> show and to work with internally at our municipality.
>
>
>
> However, I cannot figure out if it’s the right purpose to use Hadoop for,
> if it is “overkill” or…
>
>
>
> Could someone elaborate on such topic?
>
>
>
> I’ve Googled around a lot and looked at various videos online and Hadoop
> seems to have it place, also in an open data platform environment.
>
>
>
> Best regards,
>
> Henrik
>
>
>

Mime
View raw message