hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-21676) use a system table as an alternative proc store
Date Thu, 02 May 2019 17:40:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-21676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831806#comment-16831806
] 

Sergey Shelukhin commented on HIVE-21676:
-----------------------------------------

Lol, no, it's supposed to be an HBase ticket

> use a system table as an alternative proc store
> -----------------------------------------------
>
>                 Key: HIVE-21676
>                 URL: https://issues.apache.org/jira/browse/HIVE-21676
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Priority: Major
>
> We keep hitting these issues:
> {noformat}
> 2019-04-30 23:41:52,164 INFO  [master/master:17000:becomeActiveMaster] procedure2.ProcedureExecutor:
Starting 16 core workers (bigger of cpus/4 or 16) with max (burst) worker count=160
> 2019-04-30 23:41:52,171 INFO  [master/master:17000:becomeActiveMaster] util.FSHDFSUtils:
Recover lease on dfs file .../MasterProcWALs/pv2-00000000000000000481.log
> 2019-04-30 23:41:52,176 INFO  [master/master:17000:becomeActiveMaster] util.FSHDFSUtils:
Recovered lease, attempt=0 on file=.../MasterProcWALs/pv2-00000000000000000481.log after 5ms
> 2019-04-30 23:41:52,288 INFO  [master/master:17000:becomeActiveMaster] util.FSHDFSUtils:
Recover lease on dfs file .../MasterProcWALs/pv2-00000000000000000482.log
> 2019-04-30 23:41:52,289 INFO  [master/master:17000:becomeActiveMaster] util.FSHDFSUtils:
Recovered lease, attempt=0 on file=.../MasterProcWALs/pv2-00000000000000000482.log after 1ms
> 2019-04-30 23:41:52,373 INFO  [master/master:17000:becomeActiveMaster] wal.WALProcedureStore:
Rolled new Procedure Store WAL, id=483
> 2019-04-30 23:41:52,375 INFO  [master/master:17000:becomeActiveMaster] procedure2.ProcedureExecutor:
Recovered WALProcedureStore lease in 206msec
> 2019-04-30 23:41:52,782 INFO  [master/master:17000:becomeActiveMaster] wal.ProcedureWALFormatReader:
Read 1556 entries in .../MasterProcWALs/pv2-00000000000000000482.log
> 2019-04-30 23:41:55,370 INFO  [master/master:17000:becomeActiveMaster] wal.ProcedureWALFormatReader:
Read 28113 entries in .../MasterProcWALs/pv2-00000000000000000481.log
> 2019-04-30 23:41:55,384 ERROR [master/master:17000:becomeActiveMaster] wal.WALProcedureTree:
Missing stack id 166, max stack id is 181, root procedure is Procedure(pid=289380, ppid=-1,
class=org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure)
> 2019-04-30 23:41:55,384 ERROR [master/master:17000:becomeActiveMaster] wal.WALProcedureTree:
Missing stack id 178, max stack id is 181, root procedure is Procedure(pid=289380, ppid=-1,
class=org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure)
> 2019-04-30 23:41:55,389 ERROR [master/master:17000:becomeActiveMaster] wal.WALProcedureTree:
Missing stack id 359, max stack id is 360, root procedure is Procedure(pid=285640, ppid=-1,
class=org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure)
> {noformat}
> After which the procedure(s) is/are lost and cluster is stuck permanently.
> There were no errors writing these files in the log, and no issues reading them from
HDFS, so it's purely a data loss issue in the structure. 
> I was thinking about debugging it, but on 2nd thought what we are trying to store is
some PB blob, by key.
> Coincidentally, we have an "HBase" facility that we already deploy, that does just that...
and it even has a WAL implementation. I don't know why we cannot use it for procedure state
and have to invent another complex implementation of a KV store inside a KV store.
> In all/most cases, we don't even support rollback and use the latest state, but if we
need multiple versions, this HBase product even supports that! 
> I think we should add a hbase:proc table that would be maintained similar to meta. The
latter part esp. given the existing code for meta should be much more simple than a separate
store impl.
> This should be pluggable and optional via ProcStore interface (made more abstract as
relevant - update state, scan state, get)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message