Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4256710BAD for ; Sun, 27 Apr 2014 20:59:07 +0000 (UTC) Received: (qmail 76802 invoked by uid 500); 27 Apr 2014 20:59:04 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 76652 invoked by uid 500); 27 Apr 2014 20:59:04 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 76644 invoked by uid 99); 27 Apr 2014 20:59:04 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 27 Apr 2014 20:59:04 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of swagatikat856@gmail.com designates 209.85.220.176 as permitted sender) Received: from [209.85.220.176] (HELO mail-vc0-f176.google.com) (209.85.220.176) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 27 Apr 2014 20:58:59 +0000 Received: by mail-vc0-f176.google.com with SMTP id lc6so7156858vcb.35 for ; Sun, 27 Apr 2014 13:58:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=nKl80sx7Pg1eE9PAYg1wyclk7PtOLNjmgmzBjtO2HQM=; b=QBMo5BcWK7tIBagNnhsYWAw3eyl2IiHUq7x+T8Pek/xp1iso1oe5cNGX0iJTOkZOJm u9g9onh65dk0cjQyTZZkBSNxQ4F5HAhoRcY07+gfCRIwQL+kJJc+aRQAuMzf/vjY7RKD 6s02cx+d9Kj7cwNqREqtQguln/slMQEMyHDyuu4dSBWhoxOd+HG+dVNnLTDFMUQDEL92 TtB2nsNyj6T72PyYauX0jeCE3BlkrjYto7TGFKvRPpNth8vY2LEqmmttycsoQOQ6eh08 zwGySIBD6xxIzp0bxKAEbsOFCn9sg3BxsIgBOvqvzBjUC8io/RrhCNnRBHY2vR3+uYrC o8dA== MIME-Version: 1.0 X-Received: by 10.220.250.203 with SMTP id mp11mr19694712vcb.2.1398632319133; Sun, 27 Apr 2014 13:58:39 -0700 (PDT) Received: by 10.221.9.199 with HTTP; Sun, 27 Apr 2014 13:58:39 -0700 (PDT) Received: by 10.221.9.199 with HTTP; Sun, 27 Apr 2014 13:58:39 -0700 (PDT) In-Reply-To: References: Date: Mon, 28 Apr 2014 02:28:39 +0530 Message-ID: Subject: Re: Executing Hive Queries in Parallel From: Swagatika Tripathy To: user@hive.apache.org Content-Type: multipart/alternative; boundary=089e013d05028dae0604f80c763e X-Virus-Checked: Checked by ClamAV on apache.org --089e013d05028dae0604f80c763e Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi, You can also use oozie's fork fearure which acts as a workflow scheduler to run jobs in parallel. You just need to define all our hql's inside the workflow.XML to make it run in parallel. On Apr 22, 2014 3:14 AM, "Subramanian, Sanjay (HQP)" < sanjay.subramanian@roberthalf.com> wrote: > Hey > > Instead of going into HIVE CLI > I would propose 2 ways > > *NOHUP * > nohup hive -f path/to/query/file/*hive1.hql* >> ./hive1.hql_`date > +%Y-%m-%d-%H=E2=80=93%M=E2=80=93%S`.log 2>&1 > nohup hive -f path/to/query/file/*hive2.hql* >> ./hive2.hql_`date > +%Y-%m-%d-%H=E2=80=93%M=E2=80=93%S`.log 2>&1 > nohup hive -f path/to/query/file/*hive3.hql* >> ./hive3.hql_`date > +%Y-%m-%d-%H=E2=80=93%M=E2=80=93%S`.log 2>&1 > nohup hive -f path/to/query/file/*hive4.hql* >> ./hive4.hql_`date > +%Y-%m-%d-%H=E2=80=93%M=E2=80=93%S`.log 2>&1 > nohup hive -f path/to/query/file/*hive5.hql* >> ./hive5.hql_`date > +%Y-%m-%d-%H=E2=80=93%M=E2=80=93%S`.log 2>&1 > > Each statement above will launch MR jobs on your cluster and depending > on the cluster configs the jobs will run parallelly > Scheduling jobs on the MR cluster is independent of Hive > > *SCREEN sessions* > > - Create a Screen session > - screen =E2=80=93S hive_query1 > - U r inside the screen session hive_query1 > - hive -f path/to/query/file/*hive1.hql* > - Ctrl A D > - U detach from a screen session > - Repeat for each hive query u want to run > - I.e. Say 5 screen sessions, each running a have query > - To display screen session active > - screen -x > - To attach to a screen session > - screen -x hive_query1 > > > Thanks > > Warm Regards > > > Sanjay > > > From: saurabh > Reply-To: "user@hive.apache.org" > Date: Monday, April 21, 2014 at 1:53 PM > To: "user@hive.apache.org" > Subject: Executing Hive Queries in Parallel > > > Hi, > I need some inputs to execute hive queries in parallel. I tried doing > this using CLI (by opening multiple ssh connection) and executed 4 HQL's; > it was observed that the queries are getting executed sequentially. All t= he > FOUR queries got submitted however while the first one was in execution > mode the other were in pending state. I was performing this activity on t= he > EMR running on Batch mode hence didn't able to dig into the logs. > > The hive CLI uses native hive connection which by default uses the FIFO > scheduler. This might be one of the reason for the queries getting > executed in sequence. > > I also observed that when multiple queries are executed using multiple > HUE sessions, it provides the parallel execution functionality. Can you > please suggest how the functionality of HUE can be replicated using CLI? > > I am aware of beeswax client however i am not sure how this can be used > during EMR- batch mode processing. > > Thanks in advance for going through this. Kindly let me know your > thoughts on the same. > > --089e013d05028dae0604f80c763e Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

Hi,
You can also use oozie's fork fearure=C2=A0 which acts as a workflow sc= heduler to run jobs in parallel. You just need to define all our hql's = inside the workflow.XML to make it run in parallel.

On Apr 22, 2014 3:14 AM, "Subramanian, Sanj= ay (HQP)" <san= jay.subramanian@roberthalf.com> wrote:
Hey=C2=A0

Instead of going into HIVE CLI=C2=A0
I would propose 2 ways=C2=A0

NOHUP=C2=A0
nohup hive -f path/to/query/file/hive1.h= ql >> ./hive1.hql_`date +%Y-%m-%d-%H=E2=80=93%M=E2=80=93%S`.log 2>&1
nohup hive -f path/to/query/file/hive2.h= ql >> ./hive2.hql_`date +%Y-%m-%d-%H=E2=80=93%M=E2=80=93%S`.log 2>&1
nohup hive -f path/to/query/file/hive3.h= ql >> ./hive3.hql_`date +%Y-%m-%d-%H=E2=80=93%M=E2=80=93%S`.log 2>&1
nohup hive -f path/to/query/file/hive4.h= ql >> ./hive4.hql_`date +%Y-%m-%d-%H=E2=80=93%M=E2=80=93%S`.log 2>&1
nohup hive -f path/to/query/file/hive5.h= ql >> ./hive5.hql_`date +%Y-%m-%d-%H=E2=80=93%M=E2=80=93%S`.log 2>&1

Each statement above will launch MR jobs on your cluster and depending on t= he cluster configs the jobs will run parallelly
Scheduling jobs on the MR cluster is independent of Hive=C2=A0

SCREEN sessions
  • Create a Screen session=C2=A0
    • screen =C2=A0=E2=80=93S =C2=A0hive_query1
    • U r inside the screen session hive_query1=C2=A0
      • hive -f path/to/query/file/hive1.hql
    • Ctr= l A D
      • U detach from a s= creen session
  • Repeat for each hive query u want to run
    • I.e. Say 5 screen sessions, each running a have query =C2=A0
  • To display screen session active=C2=A0
    • screen -x
  • To attach to a screen session
    • screen =C2=A0-x=C2=A0hive_query1

Thanks

Warm Regards


Sanjay

From: saurabh <mpp.databases@gmail.com>=
Reply-To: "user@hive.apache.org" <<= a href=3D"mailto:user@hive.apache.org" target=3D"_blank">user@hive.apache.o= rg>
Date: Monday, April 21, 2014 at 1:5= 3 PM
To: "user@hive.apache.org" <user@hive.apache.org= >
Subject: Executing Hive Queries in = Parallel


Hi,
I need some inputs to execute hive queries in parallel. I tried doing this = using CLI (by opening multiple ssh connection) and executed 4 HQL's; it= was observed that the queries are getting executed sequentially. All the F= OUR queries got submitted however while the first one was in execution mode the other were in pending state. I was= performing this activity on the EMR running on Batch mode hence didn't= able to dig into the logs.

The hive CLI uses native hive connection which by default uses the FIFO sch= eduler.=C2=A0 This might be one of the reason for the queries getting execu= ted in sequence.

I also observed that when multiple queries are executed using multiple HUE = sessions, it provides the parallel execution functionality. Can you please = suggest how the functionality of HUE can be replicated using CLI?

I am aware of beeswax client however i am not sure how this can be used dur= ing EMR- batch mode processing.

Thanks in advance for going through this. Kindly let me know your thoughts = on the same.

--089e013d05028dae0604f80c763e--