hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Gates (JIRA)" <>
Subject [jira] [Commented] (HIVE-8367) delete writes records in wrong order in some cases
Date Wed, 08 Oct 2014 22:53:36 GMT


Alan Gates commented on HIVE-8367:

bq. What was the original query where the issue showed up?
create table concur_orc_tab(name varchar(50), age int, gpa decimal(3, 2)) clustered by (age)
into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true');
insert into table concur_orc_tab select * from texttab; -- loads 10k records into the table
delete from concur_orc_tab where age >= 20 and age < 30;
This resulted in only some rows being deleted (~300 of the 1700 that should have been deleted)

What precisely was the problem and how does the RS deduplication change help?
The problem was that because the code was turning off the RS deduplication it was getting
a plan with two MR jobs.  The sort by ROW__ID was done in job one, and the bucketing was done
in job two.  This meant that the bucketing in job 2 partially undid the sorting of job 1,
resulting in only some of the records showing up as deleted (since the records have to be
written in the delta file in proper order).  The minimum number of reducers on which to apply
the RS deduplication is pushed to 1 so that this optimization is used for even small queries.

How is the changes to sort order of ROW__ID related?
That should never have been set to descending in the first place.  ROW__ID needs to be stored
ascending to work properly.  I suspect it was a fluke of most of the qfile tests that they
worked with this on.  (Actually Thejas asked at the time why this was necessary and rather
than fixing it (which I should have done) I just said I didn't know.  Oops.)  

bq. change is not needed
What change?  I don't see any changes to that file in the patch.

> delete writes records in wrong order in some cases
> --------------------------------------------------
>                 Key: HIVE-8367
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.14.0
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>            Priority: Blocker
>             Fix For: 0.14.0
>         Attachments: HIVE-8367.patch
> I have found one query with 10k records where you do:
> create table
> insert into table -- 10k records
> delete from table -- just some records
> The records in the delete delta are not ordered properly by rowid.
> I assume this applies to updates as well, but I haven't tested it yet.

This message was sent by Atlassian JIRA

View raw message