Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0309A10D01 for ; Sun, 27 Apr 2014 22:01:01 +0000 (UTC) Received: (qmail 23166 invoked by uid 500); 27 Apr 2014 22:00:58 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 23060 invoked by uid 500); 27 Apr 2014 22:00:57 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 23052 invoked by uid 99); 27 Apr 2014 22:00:57 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 27 Apr 2014 22:00:57 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of manish.hadoop.work@gmail.com designates 209.85.213.49 as permitted sender) Received: from [209.85.213.49] (HELO mail-yh0-f49.google.com) (209.85.213.49) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 27 Apr 2014 22:00:53 +0000 Received: by mail-yh0-f49.google.com with SMTP id t59so1905895yho.36 for ; Sun, 27 Apr 2014 15:00:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=MbDuJpNJ+PCNmDhDl0+HLPn6FARR70b/NSTG+zrLXgA=; b=G+21bVNxGW9gr488rqveM4mEqpQXq1u4j1bMmnwEiAO4ujMXNATo6PqRGy/aMnIeLj +sLCGwzmxE5bkkTA8ZLPKi2Vsm6VhZcw4wK7eUd0CCeDokmT1eYKQ1KncpAa3/wj/nY3 AoCKPVn8IjpVIw83W8+EbyVguTR5vr/pevM/hoUhzckniDBGxwfUar/BhGqeo/xgGykf QTciTSKDmdDkScJU+Lu/9m0+DNlhwqK51jcRS02sPTI5Ff7CjD8htchJPAkD77vKmDzD ITQ1rV8/eoupSyF4LKWi0vrRoXccsQqDyWGeCHb1WVzv88HkaoCqSOds5Tikg8RRjV+Q jmLw== MIME-Version: 1.0 X-Received: by 10.236.220.72 with SMTP id n68mr84898yhp.102.1398636032486; Sun, 27 Apr 2014 15:00:32 -0700 (PDT) Received: by 10.170.126.14 with HTTP; Sun, 27 Apr 2014 15:00:32 -0700 (PDT) In-Reply-To: References: Date: Sun, 27 Apr 2014 15:00:32 -0700 Message-ID: Subject: Re: Executing Hive Queries in Parallel From: Manish Malhotra To: Hive Content-Type: multipart/alternative; boundary=001a11c2badae2f04404f80d5390 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c2badae2f04404f80d5390 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable What Sanjay and Swagatika replied are perfect. Plus fundamentally if you see, if you are able to run the hive query from CLI or some internal API like HiveDriver, the flow will be this: >> Compile the query >> Get the info from Hive Metastore using Thrift or JDBC, Optimize it ( if required and can do) >> Generate the Java MR code. >> Push the jobs ( might need to execute more then 1 in sequence) to the JobTracker Now the final step make sure that these MR job runs in parallel based on the Queue and availability of the MR slots on the cluster. So, irrespective you are running query using nohup hive -q or from multiple machines or Oozie or Your custom code. It boils down to your system/code is not submitting query in sequence or not waiting and your cluster has enough resource to run MR in parallel. Regards, Manish On Sun, Apr 27, 2014 at 1:58 PM, Swagatika Tripathy wrote: > Hi, > You can also use oozie's fork fearure which acts as a workflow scheduler > to run jobs in parallel. You just need to define all our hql's inside the > workflow.XML to make it run in parallel. > On Apr 22, 2014 3:14 AM, "Subramanian, Sanjay (HQP)" < > sanjay.subramanian@roberthalf.com> wrote: > >> Hey >> >> Instead of going into HIVE CLI >> I would propose 2 ways >> >> *NOHUP * >> nohup hive -f path/to/query/file/*hive1.hql* >> ./hive1.hql_`date >> +%Y-%m-%d-%H=E2=80=93%M=E2=80=93%S`.log 2>&1 >> nohup hive -f path/to/query/file/*hive2.hql* >> ./hive2.hql_`date >> +%Y-%m-%d-%H=E2=80=93%M=E2=80=93%S`.log 2>&1 >> nohup hive -f path/to/query/file/*hive3.hql* >> ./hive3.hql_`date >> +%Y-%m-%d-%H=E2=80=93%M=E2=80=93%S`.log 2>&1 >> nohup hive -f path/to/query/file/*hive4.hql* >> ./hive4.hql_`date >> +%Y-%m-%d-%H=E2=80=93%M=E2=80=93%S`.log 2>&1 >> nohup hive -f path/to/query/file/*hive5.hql* >> ./hive5.hql_`date >> +%Y-%m-%d-%H=E2=80=93%M=E2=80=93%S`.log 2>&1 >> >> Each statement above will launch MR jobs on your cluster and depending >> on the cluster configs the jobs will run parallelly >> Scheduling jobs on the MR cluster is independent of Hive >> >> *SCREEN sessions* >> >> - Create a Screen session >> - screen =E2=80=93S hive_query1 >> - U r inside the screen session hive_query1 >> - hive -f path/to/query/file/*hive1.hql* >> - Ctrl A D >> - U detach from a screen session >> - Repeat for each hive query u want to run >> - I.e. Say 5 screen sessions, each running a have query >> - To display screen session active >> - screen -x >> - To attach to a screen session >> - screen -x hive_query1 >> >> >> Thanks >> >> Warm Regards >> >> >> Sanjay >> >> >> From: saurabh >> Reply-To: "user@hive.apache.org" >> Date: Monday, April 21, 2014 at 1:53 PM >> To: "user@hive.apache.org" >> Subject: Executing Hive Queries in Parallel >> >> >> Hi, >> I need some inputs to execute hive queries in parallel. I tried doing >> this using CLI (by opening multiple ssh connection) and executed 4 HQL's= ; >> it was observed that the queries are getting executed sequentially. All = the >> FOUR queries got submitted however while the first one was in execution >> mode the other were in pending state. I was performing this activity on = the >> EMR running on Batch mode hence didn't able to dig into the logs. >> >> The hive CLI uses native hive connection which by default uses the FIFO >> scheduler. This might be one of the reason for the queries getting >> executed in sequence. >> >> I also observed that when multiple queries are executed using multiple >> HUE sessions, it provides the parallel execution functionality. Can you >> please suggest how the functionality of HUE can be replicated using CLI? >> >> I am aware of beeswax client however i am not sure how this can be used >> during EMR- batch mode processing. >> >> Thanks in advance for going through this. Kindly let me know your >> thoughts on the same. >> >> --001a11c2badae2f04404f80d5390 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
What Sanjay and Swagatika replied are perfect.=C2=A0
<= br>
Plus fundamentally if you see, if you are able to run the hiv= e query from CLI or some internal API like HiveDriver, the flow will be thi= s:

>> Compile the query
>> Get the i= nfo from Hive Metastore using Thrift or JDBC, Optimize it ( if required and= can do)
>> Generate the Java MR code.=C2=A0
>= > Push the jobs ( might need to execute more then 1 in sequence) to the = JobTracker=C2=A0
Now the final step make sure that these MR job runs in parallel based = on the Queue and availability of the MR slots on the cluster.=C2=A0

So, irrespective you are running query using nohup hive -= q or from multiple machines or Oozie or Your custom code.=C2=A0
It boils down to your system/code is not submitting query in sequence = or not waiting and your cluster has enough resource to run MR in parallel.= =C2=A0

Regards,
Manish



On Sun,= Apr 27, 2014 at 1:58 PM, Swagatika Tripathy <swagatikat856@gmail.c= om> wrote:

Hi,
You can also use oozie's fork fearure=C2=A0 which acts as a workflow sc= heduler to run jobs in parallel. You just need to define all our hql's = inside the workflow.XML to make it run in parallel.

On Apr 22, 2014 3:14 AM, "Subramanian, Sanj= ay (HQP)" <sanjay.subramanian@roberthalf.com> wrote:
Hey=C2=A0

Instead of going into HIVE CLI=C2=A0
I would propose 2 ways=C2=A0

NOHUP=C2=A0
nohup hive -f path/to/query/file/hive1.h= ql >> ./hive1.hql_`date +%Y-%m-%d-%H=E2=80=93%M=E2=80=93%S`.log 2>&1
nohup hive -f path/to/query/file/hive2.h= ql >> ./hive2.hql_`date +%Y-%m-%d-%H=E2=80=93%M=E2=80=93%S`.log 2>&1
nohup hive -f path/to/query/file/hive3.h= ql >> ./hive3.hql_`date +%Y-%m-%d-%H=E2=80=93%M=E2=80=93%S`.log 2>&1
nohup hive -f path/to/query/file/hive4.h= ql >> ./hive4.hql_`date +%Y-%m-%d-%H=E2=80=93%M=E2=80=93%S`.log 2>&1
nohup hive -f path/to/query/file/hive5.h= ql >> ./hive5.hql_`date +%Y-%m-%d-%H=E2=80=93%M=E2=80=93%S`.log 2>&1

Each statement above will launch MR jobs on your cluster and depending on t= he cluster configs the jobs will run parallelly
Scheduling jobs on the MR cluster is independent of Hive=C2=A0

SCREEN sessions
  • Create a Screen session=C2=A0
    • screen =C2=A0=E2=80=93S =C2=A0hive_query1
    • U r inside the screen session hive_query1=C2=A0
      • hive -f path/to/query/file/hive1.hql
    • Ctr= l A D
      • U detach from a s= creen session
  • Repeat for each hive query u want to run
    • I.e. Say 5 screen sessions, each running a have query =C2=A0
  • To display screen session active=C2=A0
    • screen -x
  • To attach to a screen session
    • screen =C2=A0-x=C2=A0hive_query1

Thanks

Warm Regards


Sanjay

From: saurabh <mpp.databases@gmail.com>=
Reply-To: "user@hive.apache.org" <<= a href=3D"mailto:user@hive.apache.org" target=3D"_blank">user@hive.apache.o= rg>
Date: Monday, April 21, 2014 at 1:5= 3 PM
To: "user@hive.apache.org" <user@hive.apache.org= >
Subject: Executing Hive Queries in = Parallel


Hi,
I need some inputs to execute hive queries in parallel. I tried doing this = using CLI (by opening multiple ssh connection) and executed 4 HQL's; it= was observed that the queries are getting executed sequentially. All the F= OUR queries got submitted however while the first one was in execution mode the other were in pending state. I was= performing this activity on the EMR running on Batch mode hence didn't= able to dig into the logs.

The hive CLI uses native hive connection which by default uses the FIFO sch= eduler.=C2=A0 This might be one of the reason for the queries getting execu= ted in sequence.

I also observed that when multiple queries are executed using multiple HUE = sessions, it provides the parallel execution functionality. Can you please = suggest how the functionality of HUE can be replicated using CLI?

I am aware of beeswax client however i am not sure how this can be used dur= ing EMR- batch mode processing.

Thanks in advance for going through this. Kindly let me know your thoughts = on the same.


--001a11c2badae2f04404f80d5390--