hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Gates (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-12353) When Compactor fails it calls CompactionTxnHandler.markedCleaned(). it should not.
Date Thu, 21 Jan 2016 01:51:39 GMT

    [ https://issues.apache.org/jira/browse/HIVE-12353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15109886#comment-15109886
] 

Alan Gates commented on HIVE-12353:
-----------------------------------

CompactionTxnHandler, around line 295, I'm curious why you removed the line 
{code}
info.highestTxnId = rs.wasNull() ? null : highestTxnId;
{code}
Throughout you seem to assume this will be 0 and never null, but I'm not clear why.

In markCleaned, why do the insert into COMPLETED_COMPACTIONS as a prepared statement?  You're
only executing this once per call, what's the value of preparing the statement?  This isn't
a big deal.

A concern on the history stuff.  By keeping not just the last x records, but the last number
of success, fails, and attempts it may be hard for users to see that there were actually 50
successes since the 2 fails shown in the compaction history.  How will we make that clear
to users?  Perhaps you have enough info in the sequence numbers to communicate that.

In checkFailedCompactions:
{code}
812	        while(rs.next() && ++numTotal <= failedThreshold) {
813	          if(rs.getString(1).charAt(0) == FAILED_STATE) {
814	            numFailed++;
815	          }
816	          else {
817	            numFailed--;
818	          }
819	        }
{code}
This logic doesn't look right to me.  This will allow an accumulation of failures over time
to stop compaction, even if there have been intervening successes.  It seems it should stop
counting as soon as it sees a success in the list of compactions, on the assumption that whatever
was previously blocking compaction is fixed at that point.

It also seems users could unwittingly mess things up by setting config values poorly.  If,
for example, they set the number of retained failures to be 2 but the number of consecutive
failures that prevents further compactions to 5, they'd most likely never block compaction,
as the new house keeping thread would remove the history too quickly.

> When Compactor fails it calls CompactionTxnHandler.markedCleaned().  it should not.
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-12353
>                 URL: https://issues.apache.org/jira/browse/HIVE-12353
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>    Affects Versions: 1.0.0
>            Reporter: Eugene Koifman
>            Assignee: Eugene Koifman
>            Priority: Blocker
>         Attachments: HIVE-12353.2.patch, HIVE-12353.3.patch, HIVE-12353.4.patch, HIVE-12353.6.patch,
HIVE-12353.7.patch, HIVE-12353.patch
>
>
> One of the things that this method does is delete entries from TXN_COMPONENTS for partition
that it was trying to compact.
> This causes Aborted transactions in TXNS to become empty according to
> CompactionTxnHandler.cleanEmptyAbortedTxns() which means they can now be deleted.  
> Once they are deleted, data that belongs to these txns is deemed committed...
> We should extend COMPACTION_QUEUE state with 'f' and 's' (failed, success) states.  We
should also not delete then entry from markedCleaned()
> We'll have separate process that cleans 'f' and 's' records after X minutes (or after
> N records for a given partition exist).
> This allows SHOW COMPACTIONS to show some history info and how many times compaction
failed on a given partition (subject to retention interval) so that we don't have to call
markCleaned() on Compactor failures at the same time preventing Compactor to constantly getting
stuck on the same bad partition/table.
> Ideally we'd want to include END_TIME field.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message