hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michel Segel <michael_se...@hotmail.com>
Subject Re: working with SAS
Date Mon, 06 Feb 2012 12:50:01 GMT
Both responses assume replacing SAS w a Hadoop cluster.
I would agree that going to EC2 might make sense in terms of a PoC before investing in a physical
cluster, but we need to know more about the underlying problem.

First, can the problem be broken down in to something that can be accomplished in parallel
sub tasks?  Second... How much data? It could be a good use case for whirr...

Sent from a remote device. Please excuse any typos...

Mike Segel

On Feb 6, 2012, at 2:32 AM, Prashant Sharma <prashant.s@imaginea.com> wrote:

> + you will not necessarily need vertical systems for speeding up
> things(totally depends on your query) . Give a thought of having commodity
> hardware(much cheaper) and hadoop being suited for them, *I hope* your
> infrastructure can be cheaper in terms of price to performance ratio.
> Having said that, I do not mean you have to throw away you existing
> infrastructure, because it is ideal for certain requirements.
> 
> your solution can be like writing a mapreduce job which does what query is
> supposed to do and run it on a cluster of size ? depends! (how fast you
> want things be done? and scale). Incase your querry is adhoc and have to be
> run frequently. You might wanna consider HBASE and HIVE as solutions with a
> lot of expensive vertical nodes ;).
> 
> BTW Is your querry iterative? A little more details on your type of querry
> can attract guy's with more wisdom to help.
> 
> HTH
> 
> 
> On Mon, Feb 6, 2012 at 1:46 PM, alo alt <wget.null@googlemail.com> wrote:
> 
>> Hi,
>> 
>> hadoop is running on a linux box (mostly) and can run in a standalone
>> installation for testing only. If you decide to use hadoop with hive or
>> hbase you have to face a lot of more tasks:
>> 
>> - installation (whirr and Amazone EC2 as example)
>> - write your own mapreduce job or use hive / hbase
>> - setup sqoop with the terradata-driver
>> 
>> You can easy setup part 1 and 2 with Amazon's EC2, I think you can also
>> book Windows Server there. For a single query the best option I think
>> before you install a hadoop cluster.
>> 
>> best,
>> Alex
>> 
>> 
>> --
>> Alexander Lorenz
>> http://mapredit.blogspot.com
>> 
>> On Feb 6, 2012, at 8:11 AM, Ali Jooan Rizvi wrote:
>> 
>>> Hi,
>>> 
>>> 
>>> 
>>> I would like to know if hadoop will be of help to me? Let me explain you
>>> guys my scenario:
>>> 
>>> 
>>> 
>>> I have a windows server based single machine server having 16 Cores and
>> 48
>>> GB of Physical Memory. In addition, I have 120 GB of virtual memory.
>>> 
>>> 
>>> 
>>> I am running a query with statistical calculation on large data of over 1
>>> billion rows, on SAS. In this case, SAS is acting like a database on
>> which
>>> both source and target tables are residing. For storage, I can keep the
>>> source and target data on Teradata as well but the query containing a
>> patent
>>> can only be run on SAS interface.
>>> 
>>> 
>>> 
>>> The problem is that SAS is taking many days (25 days) to run it (a single
>>> query with statistical function) and not all cores all the time were used
>>> and rather merely 5% CPU was utilized on average. However memory
>> utilization
>>> was high, very high, and that's why large virtual memory was used.
>>> 
>>> 
>>> 
>>> Can I have a hadoop interface in place to do it all so that I may end up
>>> running the query in lesser time that is in 1 or 2 days. Anything
>> squeezing
>>> my run time will be very helpful.
>>> 
>>> 
>>> 
>>> Thanks
>>> 
>>> 
>>> 
>>> Ali Jooan Rizvi
>>> 
>> 
>> 

Mime
View raw message