hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Gates <alanfga...@gmail.com>
Subject Re: Hive compaction didn't launch
Date Thu, 28 Jul 2016 22:59:35 GMT
But until those transactions are closed you don’t know that they won’t write to partition
B.  After they write to A they may choose to write to B and then commit.  The compactor can
not make any assumptions about what sessions with open transactions will do in the future.

Alan.

> On Jul 28, 2016, at 09:19, Igor Kuzmenko <f1sherox@gmail.com> wrote:
> 
> But this minOpenTxn value isn't from from delta I want to compact. minOpenTxn can point
on transaction in partition A while in partition B there's deltas ready for compaction. If
minOpenTxn is less than txnIds in partition B deltas, compaction won't happen. So open transaction
in partition A blocks compaction in partition B. That's seems wrong to me.
> 
> On Thu, Jul 28, 2016 at 7:06 PM, Alan Gates <alanfgates@gmail.com> wrote:
> Hive is doing the right thing there, as it cannot compact the deltas into a base file
while there are still open transactions in the delta.  Storm should be committing on some
frequency even if it doesn’t have enough data to commit.
> 
> Alan.
> 
> > On Jul 28, 2016, at 05:36, Igor Kuzmenko <f1sherox@gmail.com> wrote:
> >
> > I made some research on that issue.
> > The problem is in ValidCompactorTxnList::isTxnRangeValid method.
> >
> > Here's code:
> > @Override
> > public RangeResponse isTxnRangeValid(long minTxnId, long maxTxnId) {
> >   if (highWatermark < minTxnId) {
> >     return RangeResponse.NONE;
> >   } else if (minOpenTxn < 0) {
> >     return highWatermark >= maxTxnId ? RangeResponse.ALL : RangeResponse.NONE;
> >   } else {
> >     return minOpenTxn > maxTxnId ? RangeResponse.ALL : RangeResponse.NONE;
> >   }
> > }
> >
> > In my case this method returned RangeResponce.NONE for most of delta files. With
this value delta file doesn't include in compaction.
> >
> > Last 'else' bock compare minOpenTxn to maxTxnId and if maxTxnId bigger return RangeResponce.NONE,
thats a problem for me, because of using Storm Hive Bolt. Hive Bolt gets transaction and maintain
it open with heartbeat until there's data to commit.
> >
> > So if i get transaction and maintain it open all compactions will stop. Is it incorrect
Hive behavior, or Storm should close transaction?
> >
> >
> >
> >
> > On Wed, Jul 27, 2016 at 8:46 PM, Igor Kuzmenko <f1sherox@gmail.com> wrote:
> > Thanks for reply, Alan. My guess with Storm was wrong. Today I get same behavior
with running Storm topology.
> > Anyway, I'd like to know, how can I check that transaction batch was closed correctly?
> >
> > On Wed, Jul 27, 2016 at 8:09 PM, Alan Gates <alanfgates@gmail.com> wrote:
> > I don’t know the details of how the storm application that streams into Hive works,
but this sounds like the transaction batches weren’t getting closed.  Compaction can’t
happen until those batches are closed.  Do you know how you had storm configured?  Also, you
might ask separately on the storm list to see if people have seen this issue before.
> >
> > Alan.
> >
> > > On Jul 27, 2016, at 03:31, Igor Kuzmenko <f1sherox@gmail.com> wrote:
> > >
> > > One more thing. I'm using Apache Storm to stream data in Hive. And when I turned
off Storm topology compactions started to work properly.
> > >
> > > On Tue, Jul 26, 2016 at 6:28 PM, Igor Kuzmenko <f1sherox@gmail.com> wrote:
> > > I'm using Hive 1.2.1 transactional table. Inserting data in it via Hive Streaming
API. After some time i expect compaction to start but it didn't happen:
> > >
> > > Here's part of log, which shows that compactor initiator thread doesn't see
any delta files:
> > > 2016-07-26 18:06:52,459 INFO  [Thread-8]: compactor.Initiator (Initiator.java:run(89))
- Checking to see if we should compact default.data_aaa.dt=20160726
> > > 2016-07-26 18:06:52,496 DEBUG [Thread-8]: io.AcidUtils (AcidUtils.java:getAcidState(432))
- in directory hdfs://sorm-master01.msk.mts.ru:8020/apps/hive/warehouse/data_aaa/dt=20160726
base = null deltas = 0
> > > 2016-07-26 18:06:52,496 DEBUG [Thread-8]: compactor.Initiator (Initiator.java:determineCompactionType(271))
- delta size: 0 base size: 0 threshold: 0.1 will major compact: false
> > >
> > > But in that directory there's actually 23 files:
> > >
> > > hadoop fs -ls /apps/hive/warehouse/data_aaa/dt=20160726
> > > Found 23 items
> > > -rw-r--r--   3 storm hdfs          4 2016-07-26 17:20 /apps/hive/warehouse/data_aaa/dt=20160726/_orc_acid_version
> > > drwxrwxrwx   - storm hdfs          0 2016-07-26 17:22 /apps/hive/warehouse/data_aaa/dt=20160726/delta_71741256_71741355
> > > drwxrwxrwx   - storm hdfs          0 2016-07-26 17:23 /apps/hive/warehouse/data_aaa/dt=20160726/delta_71762456_71762555
> > > drwxrwxrwx   - storm hdfs          0 2016-07-26 17:25 /apps/hive/warehouse/data_aaa/dt=20160726/delta_71787756_71787855
> > > drwxrwxrwx   - storm hdfs          0 2016-07-26 17:26 /apps/hive/warehouse/data_aaa/dt=20160726/delta_71795756_71795855
> > > drwxrwxrwx   - storm hdfs          0 2016-07-26 17:27 /apps/hive/warehouse/data_aaa/dt=20160726/delta_71804656_71804755
> > > drwxrwxrwx   - storm hdfs          0 2016-07-26 17:29 /apps/hive/warehouse/data_aaa/dt=20160726/delta_71828856_71828955
> > > drwxrwxrwx   - storm hdfs          0 2016-07-26 17:30 /apps/hive/warehouse/data_aaa/dt=20160726/delta_71846656_71846755
> > > drwxrwxrwx   - storm hdfs          0 2016-07-26 17:32 /apps/hive/warehouse/data_aaa/dt=20160726/delta_71850756_71850855
> > > drwxrwxrwx   - storm hdfs          0 2016-07-26 17:33 /apps/hive/warehouse/data_aaa/dt=20160726/delta_71867356_71867455
> > > drwxrwxrwx   - storm hdfs          0 2016-07-26 17:34 /apps/hive/warehouse/data_aaa/dt=20160726/delta_71891556_71891655
> > > drwxrwxrwx   - storm hdfs          0 2016-07-26 17:36 /apps/hive/warehouse/data_aaa/dt=20160726/delta_71904856_71904955
> > > drwxrwxrwx   - storm hdfs          0 2016-07-26 17:37 /apps/hive/warehouse/data_aaa/dt=20160726/delta_71907256_71907355
> > > drwxrwxrwx   - storm hdfs          0 2016-07-26 17:39 /apps/hive/warehouse/data_aaa/dt=20160726/delta_71918756_71918855
> > > drwxrwxrwx   - storm hdfs          0 2016-07-26 17:40 /apps/hive/warehouse/data_aaa/dt=20160726/delta_71947556_71947655
> > > drwxrwxrwx   - storm hdfs          0 2016-07-26 17:41 /apps/hive/warehouse/data_aaa/dt=20160726/delta_71960656_71960755
> > > drwxrwxrwx   - storm hdfs          0 2016-07-26 17:43 /apps/hive/warehouse/data_aaa/dt=20160726/delta_71963156_71963255
> > > drwxrwxrwx   - storm hdfs          0 2016-07-26 17:44 /apps/hive/warehouse/data_aaa/dt=20160726/delta_71964556_71964655
> > > drwxrwxrwx   - storm hdfs          0 2016-07-26 17:46 /apps/hive/warehouse/data_aaa/dt=20160726/delta_71987156_71987255
> > > drwxrwxrwx   - storm hdfs          0 2016-07-26 17:47 /apps/hive/warehouse/data_aaa/dt=20160726/delta_72015756_72015855
> > > drwxrwxrwx   - storm hdfs          0 2016-07-26 17:48 /apps/hive/warehouse/data_aaa/dt=20160726/delta_72021356_72021455
> > > drwxrwxrwx   - storm hdfs          0 2016-07-26 17:50 /apps/hive/warehouse/data_aaa/dt=20160726/delta_72048756_72048855
> > > drwxrwxrwx   - storm hdfs          0 2016-07-26 17:50 /apps/hive/warehouse/data_aaa/dt=20160726/delta_72070856_72070955
> > >
> > > Full log here.
> > >
> > > What could go wrong?
> > >
> >
> >
> >
> 
> 


Mime
View raw message