Mailing-List: contact user-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hive.apache.org
Received-SPF: pass (athena.apache.org: domain of
 bharathvissapragada1990@gmail.com designates 209.85.210.48 as permitted
 sender)
MIME-Version: 1.0
From: bharath vissapragada <bharathvissapragada1990@gmail.com>
Date: Fri, 14 Sep 2012 15:48:22 +0530
Message-ID: 
 <CAK3hZ7R=qv58ne4=xNGdodA+xMMagotXSck8UrROOOga=dWemQ@mail.gmail.com>
Subject: Running TPCH workload on Hive
To: hive-user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=047d7b2ed9e998760704c9a6bff6

--047d7b2ed9e998760704c9a6bff6
Content-Type: text/plain; charset=ISO-8859-1

Hi folks,

Iam trying to run TPC-H workload on Hive (Hive-600). However Iam facing
problems with configuration. The queries are taking insanely long time.

I ran Q21 on a TPCH workload of SF 100 (same dataset on which  the
experiments in that doc were run) on a cluster of 8 datanodes+TT and 1 NN.
My datanode config is as follows

2 dual core CPU (total 4 threads in parallel)
3.8GB main memory per node

configured 4 Maps and 4 reducers per node . I've set hive-reducers max to
32 (total reduce slots in hadoop cluster) instead of letting hive decide it.

My Q21 has been running for 12 hrs for now compared to 2500 seconds that
was mentioned in the results . I wonder what is so terribly wrong with my
config. Some of my reducers take insanely long time (6hrs sometime) and
others take 2hrs (even this is more compared to the overall run time of
2500secs of same query as in the results).

Can someone help me with this? Is the data partitioned or something (in the
experiments)?
-- 
Regards,
Bharath .V
w:http://researchweb.iiit.ac.in/~bharath.v

--047d7b2ed9e998760704c9a6bff6
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<span style=3D"color:rgb(34,34,34);font-family:arial,sans-serif;font-size:1=
3px;background-color:rgb(255,255,255)">Hi folks,</span><div style=3D"color:=
rgb(34,34,34);font-family:arial,sans-serif;font-size:13px;background-color:=
rgb(255,255,255)">

<br></div><div style=3D"color:rgb(34,34,34);font-family:arial,sans-serif;fo=
nt-size:13px;background-color:rgb(255,255,255)">Iam trying to run TPC-H wor=
kload on Hive (Hive-600). However Iam facing problems with configuration. T=
he queries are taking insanely long time.</div>

<div style=3D"color:rgb(34,34,34);font-family:arial,sans-serif;font-size:13=
px;background-color:rgb(255,255,255)"><br></div><div style=3D"color:rgb(34,=
34,34);font-family:arial,sans-serif;font-size:13px;background-color:rgb(255=
,255,255)">

I ran Q21 on a TPCH workload of SF 100 (same dataset on which =A0the experi=
ments in that doc were run) on a cluster of 8 datanodes+TT and 1 NN. My dat=
anode config is as follows</div><div style=3D"color:rgb(34,34,34);font-fami=
ly:arial,sans-serif;font-size:13px;background-color:rgb(255,255,255)">

<br></div><div style=3D"color:rgb(34,34,34);font-family:arial,sans-serif;fo=
nt-size:13px;background-color:rgb(255,255,255)">2 dual core CPU (total 4 th=
reads in parallel)</div><div style=3D"color:rgb(34,34,34);font-family:arial=
,sans-serif;font-size:13px;background-color:rgb(255,255,255)">

3.8GB main memory per node</div><div style=3D"color:rgb(34,34,34);font-fami=
ly:arial,sans-serif;font-size:13px;background-color:rgb(255,255,255)"><br><=
/div><div style=3D"color:rgb(34,34,34);font-family:arial,sans-serif;font-si=
ze:13px;background-color:rgb(255,255,255)">

configured 4 Maps and 4 reducers per node . I&#39;ve set hive-reducers max =
to 32 (total reduce slots in hadoop cluster) instead of letting hive decide=
 it.</div><div style=3D"color:rgb(34,34,34);font-family:arial,sans-serif;fo=
nt-size:13px;background-color:rgb(255,255,255)">

<br></div><div style=3D"color:rgb(34,34,34);font-family:arial,sans-serif;fo=
nt-size:13px;background-color:rgb(255,255,255)">My Q21 has been running for=
 12 hrs for now compared to 2500 seconds that was mentioned in the results =
. I wonder what is so terribly wrong with my config. Some of my reducers ta=
ke insanely long time (6hrs sometime) and others take 2hrs (even this is mo=
re compared to the overall run time of 2500secs of same query as in the res=
ults).</div>

<div style=3D"color:rgb(34,34,34);font-family:arial,sans-serif;font-size:13=
px;background-color:rgb(255,255,255)"><br></div><div style=3D"color:rgb(34,=
34,34);font-family:arial,sans-serif;font-size:13px;background-color:rgb(255=
,255,255)">

Can someone help me with this? Is the data partitioned or something (in the=
 experiments)? =A0</div>-- <br>Regards,<br>Bharath .V<br>w:<a href=3D"http:=
//researchweb.iiit.ac.in/%7Ebharath.v" target=3D"_blank">http://researchweb=
.iiit.ac.in/~bharath.v</a><br>


--047d7b2ed9e998760704c9a6bff6--