Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of prashant.s@imaginea.com
 designates 174.129.3.103 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <6BA9420F-7072-4170-83C1-414A58900ED9@gmail.com>
References: <4f2f7d7a.0e3b650a.34d6.ffffe995SMTPIN_ADDED@mx.google.com>
 <6BA9420F-7072-4170-83C1-414A58900ED9@gmail.com>
From: Prashant Sharma <prashant.s@imaginea.com>
Date: Mon, 6 Feb 2012 14:02:53 +0530
Message-ID: 
 <CAOYDGoDGC0=vFCbYqgSsno045c1sKD+j61D42CVzgADPRnQnug@mail.gmail.com>
Subject: Re: working with SAS
To: common-user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=14dae93b5c5e6194f504b84783cf

--14dae93b5c5e6194f504b84783cf
Content-Type: text/plain; charset=ISO-8859-1

+ you will not necessarily need vertical systems for speeding up
things(totally depends on your query) . Give a thought of having commodity
hardware(much cheaper) and hadoop being suited for them, *I hope* your
infrastructure can be cheaper in terms of price to performance ratio.
Having said that, I do not mean you have to throw away you existing
infrastructure, because it is ideal for certain requirements.

your solution can be like writing a mapreduce job which does what query is
supposed to do and run it on a cluster of size ? depends! (how fast you
want things be done? and scale). Incase your querry is adhoc and have to be
run frequently. You might wanna consider HBASE and HIVE as solutions with a
lot of expensive vertical nodes ;).

BTW Is your querry iterative? A little more details on your type of querry
can attract guy's with more wisdom to help.

HTH


On Mon, Feb 6, 2012 at 1:46 PM, alo alt <wget.null@googlemail.com> wrote:

> Hi,
>
> hadoop is running on a linux box (mostly) and can run in a standalone
> installation for testing only. If you decide to use hadoop with hive or
> hbase you have to face a lot of more tasks:
>
> - installation (whirr and Amazone EC2 as example)
> - write your own mapreduce job or use hive / hbase
> - setup sqoop with the terradata-driver
>
> You can easy setup part 1 and 2 with Amazon's EC2, I think you can also
> book Windows Server there. For a single query the best option I think
> before you install a hadoop cluster.
>
> best,
>  Alex
>
>
> --
> Alexander Lorenz
> http://mapredit.blogspot.com
>
> On Feb 6, 2012, at 8:11 AM, Ali Jooan Rizvi wrote:
>
> > Hi,
> >
> >
> >
> > I would like to know if hadoop will be of help to me? Let me explain you
> > guys my scenario:
> >
> >
> >
> > I have a windows server based single machine server having 16 Cores and
> 48
> > GB of Physical Memory. In addition, I have 120 GB of virtual memory.
> >
> >
> >
> > I am running a query with statistical calculation on large data of over 1
> > billion rows, on SAS. In this case, SAS is acting like a database on
> which
> > both source and target tables are residing. For storage, I can keep the
> > source and target data on Teradata as well but the query containing a
> patent
> > can only be run on SAS interface.
> >
> >
> >
> > The problem is that SAS is taking many days (25 days) to run it (a single
> > query with statistical function) and not all cores all the time were used
> > and rather merely 5% CPU was utilized on average. However memory
> utilization
> > was high, very high, and that's why large virtual memory was used.
> >
> >
> >
> > Can I have a hadoop interface in place to do it all so that I may end up
> > running the query in lesser time that is in 1 or 2 days. Anything
> squeezing
> > my run time will be very helpful.
> >
> >
> >
> > Thanks
> >
> >
> >
> > Ali Jooan Rizvi
> >
>
>

--14dae93b5c5e6194f504b84783cf--