Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 94CB59A82 for ; Mon, 6 Feb 2012 08:34:25 +0000 (UTC) Received: (qmail 3434 invoked by uid 500); 6 Feb 2012 08:34:21 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 2997 invoked by uid 500); 6 Feb 2012 08:33:58 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 2981 invoked by uid 99); 6 Feb 2012 08:33:53 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Feb 2012 08:33:53 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of prashant.s@imaginea.com designates 174.129.3.103 as permitted sender) Received: from [174.129.3.103] (HELO www.pramati.com) (174.129.3.103) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Feb 2012 08:33:43 +0000 Received: from mail-tul01m020-f176.google.com (unknown [209.85.214.176]) by www.pramati.com (Postfix) with ESMTPA id D66E3A8903 for ; Mon, 6 Feb 2012 08:33:20 +0000 (UTC) Received: by obbwd18 with SMTP id wd18so9443755obb.35 for ; Mon, 06 Feb 2012 00:33:13 -0800 (PST) Received: by 10.182.64.35 with SMTP id l3mr4125069obs.1.1328517193241; Mon, 06 Feb 2012 00:33:13 -0800 (PST) MIME-Version: 1.0 Received: by 10.60.29.37 with HTTP; Mon, 6 Feb 2012 00:32:53 -0800 (PST) In-Reply-To: <6BA9420F-7072-4170-83C1-414A58900ED9@gmail.com> References: <4f2f7d7a.0e3b650a.34d6.ffffe995SMTPIN_ADDED@mx.google.com> <6BA9420F-7072-4170-83C1-414A58900ED9@gmail.com> From: Prashant Sharma Date: Mon, 6 Feb 2012 14:02:53 +0530 Message-ID: Subject: Re: working with SAS To: common-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=14dae93b5c5e6194f504b84783cf X-Virus-Checked: Checked by ClamAV on apache.org --14dae93b5c5e6194f504b84783cf Content-Type: text/plain; charset=ISO-8859-1 + you will not necessarily need vertical systems for speeding up things(totally depends on your query) . Give a thought of having commodity hardware(much cheaper) and hadoop being suited for them, *I hope* your infrastructure can be cheaper in terms of price to performance ratio. Having said that, I do not mean you have to throw away you existing infrastructure, because it is ideal for certain requirements. your solution can be like writing a mapreduce job which does what query is supposed to do and run it on a cluster of size ? depends! (how fast you want things be done? and scale). Incase your querry is adhoc and have to be run frequently. You might wanna consider HBASE and HIVE as solutions with a lot of expensive vertical nodes ;). BTW Is your querry iterative? A little more details on your type of querry can attract guy's with more wisdom to help. HTH On Mon, Feb 6, 2012 at 1:46 PM, alo alt wrote: > Hi, > > hadoop is running on a linux box (mostly) and can run in a standalone > installation for testing only. If you decide to use hadoop with hive or > hbase you have to face a lot of more tasks: > > - installation (whirr and Amazone EC2 as example) > - write your own mapreduce job or use hive / hbase > - setup sqoop with the terradata-driver > > You can easy setup part 1 and 2 with Amazon's EC2, I think you can also > book Windows Server there. For a single query the best option I think > before you install a hadoop cluster. > > best, > Alex > > > -- > Alexander Lorenz > http://mapredit.blogspot.com > > On Feb 6, 2012, at 8:11 AM, Ali Jooan Rizvi wrote: > > > Hi, > > > > > > > > I would like to know if hadoop will be of help to me? Let me explain you > > guys my scenario: > > > > > > > > I have a windows server based single machine server having 16 Cores and > 48 > > GB of Physical Memory. In addition, I have 120 GB of virtual memory. > > > > > > > > I am running a query with statistical calculation on large data of over 1 > > billion rows, on SAS. In this case, SAS is acting like a database on > which > > both source and target tables are residing. For storage, I can keep the > > source and target data on Teradata as well but the query containing a > patent > > can only be run on SAS interface. > > > > > > > > The problem is that SAS is taking many days (25 days) to run it (a single > > query with statistical function) and not all cores all the time were used > > and rather merely 5% CPU was utilized on average. However memory > utilization > > was high, very high, and that's why large virtual memory was used. > > > > > > > > Can I have a hadoop interface in place to do it all so that I may end up > > running the query in lesser time that is in 1 or 2 days. Anything > squeezing > > my run time will be very helpful. > > > > > > > > Thanks > > > > > > > > Ali Jooan Rizvi > > > > --14dae93b5c5e6194f504b84783cf--