hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sachin Bochare <sachin_boch...@persistent.co.in>
Subject RE: Select not working with Index patch
Date Mon, 03 May 2010 04:28:28 GMT
Thanks Edwards.

I was exploring indexing patch and wanted to know how the index table looks like.

Few points which were not mentioned in my earlier posting:

1. I haven't created any index on the test table. So index logic is not involved here. The
query is not working on normal table.
2. I checked the jobtracker result in Hadoop web interface and found that Hadoop has read
277 bytes and written 33 bytes. The counters are HDFS_BYTES_READ=277 and HDFS_BYTES_WRITTEN=23.
The input file size is 277 bytes and final result is 23 bytes. So Hadoop had returned the
correct output. However for some reason Hive hasn't received or returned those results.

I guess it would be a minor code change. I want to identify the code and fix it temporary
in my send box. Could someone please point to the module where I should look for this issue?

Regards,
Sachin

________________________________
From: Edward Capriolo [mailto:edlinuxguru@gmail.com]
Sent: Sunday, May 02, 2010 8:08 PM
To: hive-user@hadoop.apache.org
Subject: Re: Select not working with Index patch


On Sun, May 2, 2010 at 2:29 AM, Sachin Bochare <sachin_bochare@persistent.co.in<mailto:sachin_bochare@persistent.co.in>>
wrote:
Hi,

I applied index patch available at : https://issues.apache.org/jira/browse/HIVE-678

However after applying the indexing patch, simple select statements are not showing any results.
The "select *" is working but selecting a specific column is not working. I have pasted an
example below which illustrates the problem.

The same select is working without the patch on the same metastore_db. The only difference
between working code and non-working code is the patch.

I used 796926 version of the code. The patch attached in HIVE-678 was created on this version.

Following example illustrates the problem:

Example with patch code:
-----------------------------

=====================================
hive> create table ourtest (empid int, firstname string, lastname string, hoursworked int)
partitioned by(dt string, place string) clustered by (empid) sorted by(hoursworked) into 4
buckets row format delimited fields terminated by ',' stored as textfile;
OK
Time taken: 0.307 seconds
hive> LOAD DATA LOCAL INPATH '/root/data/ourtest_data.csv' INTO TABLE ourtest PARTITION(dt='2010-02-27',
place='Pune');
Copying data from file:/root/data/ourtest_data.csv
Loading data to table ourtest partition {dt=2010-02-27, place=Pune}
OK
Time taken: 0.753 seconds
hive> select * from ourtest; ---> Select * is working fine.
OK
0       firstname       lastname        0       2010-02-27      Pune
1       firstname1      lastname1       1       2010-02-27      Pune
2       firstname2      lastname2       2       2010-02-27      Pune
3       firstname3      lastname3       3       2010-02-27      Pune
4       firstname4      lastname4       4       2010-02-27      Pune
5       firstname5      lastname5       5       2010-02-27      Pune
6       firstname6      lastname6       6       2010-02-27      Pune
7       firstname7      lastname7       7       2010-02-27      Pune
8       firstname8      lastname8       8       2010-02-27      Pune
9       firstname9      lastname9       9       2010-02-27      Pune
10      firstname10     lastname10      10      2010-02-27      Pune
Time taken: 0.106 seconds
hive> select empid from ourtest; ---> Selecting specific column is not working.
Total MapReduce jobs = 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201002091652_0170, Tracking URL = http://v-hadoop3.persistent.co.in:60030/jobdetails.jsp?jobid=job_201002091652_0170
Kill Command = /root/hadoop-0.20.1/bin/../bin/hadoop job  -Dmapred.job.tracker=v-hadoop3.persistent.co.in:30001<http://v-hadoop3.persistent.co.in:30001>
-kill job_201002091652_0170
2010-05-02 08:40:48,951 map = 0%,  reduce =0%
2010-05-02 08:40:58,044 map = 50%,  reduce =0%
2010-05-02 08:40:59,057 map = 100%,  reduce =0%
2010-05-02 08:41:02,067 map = 100%,  reduce =100%
Ended Job = job_201002091652_0170
OK
Time taken: 15.494 seconds
=====================================

Example without patch code:
--------------------------------
Example query is working after using without-patch code on the same metastore_db.

=====================================
root@v-hadoop3<https://puneexchange.persistent.co.in/owa/UrlBlockedError.aspx>:~/<https://puneexchange.persistent.co.in/owa/UrlBlockedError.aspx>sachin/Hive-796926-Patch<https://puneexchange.persistent.co.in/owa/UrlBlockedError.aspx>#
../Hive-796926/build/dist/bin/hive
Hive history file=/tmp/root/hive_job_log_root_201005020928_924651644.txt
hive> select empid from ourtest;
Total MapReduce jobs = 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201002091652_0190, Tracking URL = http://v-hadoop3.persistent.co.in:60030/jobdetails.jsp?jobid=job_201002091652_0190
Kill Command = /root/hadoop-0.20.1/bin/../bin/hadoop job  -Dmapred.job.tracker=v-hadoop3.persistent.co.in:30001<http://v-hadoop3.persistent.co.in:30001>
-kill job_201002091652_0190
2010-05-02 09:29:04,733 map = 0%,  reduce =0%
2010-05-02 09:29:18,799 map = 100%,  reduce =0%
2010-05-02 09:29:21,823 map = 100%,  reduce =100%
Ended Job = job_201002091652_0190
OK
0
1
2
3
4
5
6
7
8
9
10
Time taken: 22.268 seconds
=====================================

Can anyone point to what can be the problem here? Which module is a suspect here?

Regards,
Sachin

DISCLAIMER ========== This e-mail may contain privileged and confidential information which
is the property of Persistent Systems Ltd. It is intended only for the use of the individual
or entity to which it is addressed. If you are not the intended recipient, you are not authorized
to read, retain, copy, print, distribute or use this message. If you have received this communication
in error, please notify the sender and delete all copies of this message. Persistent Systems
Ltd. does not accept any liability for virus infected mails.

The comments for that issue seem to suggest the patch is not complete yet. For reference 'select
*' queries simply read that block data from hdfs so they do not use map-reduce (and thus probably
do not use any indexes either.

Edward

DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent
Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed.
If you are not the intended recipient, you are not authorized to read, retain, copy, print,
distribute or use this message. If you have received this communication in error, please notify
the sender and delete all copies of this message. Persistent Systems Ltd. does not accept
any liability for virus infected mails.

Mime
View raw message