hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alec Ten Harmsel <a...@alectenharmsel.com>
Subject Re: Hadoop and Open Data (CKAN.org).
Date Thu, 04 Sep 2014 13:11:51 GMT
I would recommend using Hadoop only if you are ingesting a lot of data
and you need reasonable performance at scale. I would recommend starting
with using <insert language/tool of choice> to ingest and transform data
until that process starts taking too long.

For example, one of our researchers at the University of Michigan had to
process ~150GB of data. Using python, processing that data took about 45
minutes - it was not worth it to spend extra development time to run it
on Hadoop. This time will change depending on what you need to do and
the hardware available, naturally.

So until you need to frequently process large amounts of data, I'd stick
with something you're already familiar with.

Alec Ten Harmsel

On 09/04/2014 03:30 AM, Henrik Aagaard Jørgensen wrote:
>
> Dear all,
>
>  
>
> I’m very new to Hadoop as I’m still trying to grasp its value and 
> purpose. I do hope my question on this mailing list is OK.
>
>  
>
> I manage our open data platform at our municipality, using CKAN.org.
> It works very well for its purpose of showing data and adding API’s to
> data.
>
>  
>
> However, I’m very interested in knowing more about Hadoop and if it
> would fit into a (open) data platform, as we are getting more and more
> data to show and to work with internally at our municipality.
>
>  
>
> However, I cannot figure out if it’s the right purpose to use Hadoop
> for, if it is “overkill” or…
>
>  
>
> Could someone elaborate on such topic?
>
>  
>
> I’ve Googled around a lot and looked at various videos online and
> Hadoop seems to have it place, also in an open data platform environment.
>
>  
>
> Best regards,
>
> Henrik
>


Mime
View raw message