Mailing-List: contact dev-help@phoenix.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@phoenix.apache.org
Date: Thu, 17 Aug 2017 15:11:00 +0000 (UTC)
From: "chenglei (JIRA)" <jira@apache.org>
To: dev@phoenix.apache.org
Message-ID: <JIRA.13095352.1502978641000.55108.1502982660043@Atlassian.JIRA>
In-Reply-To: <JIRA.13095352.1502978641000@Atlassian.JIRA>
References: <JIRA.13095352.1502978641000@Atlassian.JIRA> <JIRA.13095352.1502978641183@jira-lw-us.apache.org>
Subject: [jira] [Updated] (PHOENIX-4094) ParallelWriterIndexCommitter
 incorrectly applys local updates to index tables for 4.x-HBase-0.98
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
archived-at: Thu, 17 Aug 2017 15:11:11 -0000


     [ https://issues.apache.org/jira/browse/PHOENIX-4094?page=3Dcom.atlass=
ian.jira.plugin.system.issuetabpanels:all-tabpanel ]

chenglei updated PHOENIX-4094:
------------------------------
    Description:=20
I used phoenix-4.x-HBase-0.98 in my hbase cluster.When I When I restarted m=
y hbase cluster a certain time, I noticed some  RegionServers have a lot of=
 error logs as following:

{code:java}
     2017-08-01 11:53:10,669 WARN  [rsync.slave005.bizhbasetest.sjs.ted,600=
20,1501511894174-index-writer--pool2-t786] regionserver.HRegion: Failed get=
ting lock in batch put, row=3D\x10\x00\x00\x00913f0eed-6710-4de9-8bac-077a1=
06bb9ae_0
org.apache.hadoop.hbase.regionserver.WrongRegionException: Requested row ou=
t of range for row lock on HRegion BIZARCH_NS_PRODUCT.BIZTRACER_SPAN,90ffd7=
83-b0a3-4f8a-81ef-0a7535fea197_0,1490066612493.463220cd8fad7254481595911e62=
d74d., startKey=3D'90ffd783-b0a3-4f8a-81ef-0a7535fea197_0', getEndKey()=3D'=
917fc343-3331-47fa-907c-df83a6f302f7_0', row=3D'\x10\x00\x00\x00913f0eed-67=
10-4de9-8bac-077a106bb9ae_0'
        at org.apache.hadoop.hbase.regionserver.HRegion.checkRow(HRegion.ja=
va:3539)
        at org.apache.hadoop.hbase.regionserver.HRegion.getRowLock(HRegion.=
java:3557)
        at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation=
(HRegion.java:2394)
        at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion=
.java:2261)
        at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion=
.java:2213)
        at org.apache.phoenix.util.IndexUtil.writeLocalUpdates(IndexUtil.ja=
va:671)
        at org.apache.phoenix.hbase.index.write.ParallelWriterIndexCommitte=
r$1.call(ParallelWriterIndexCommitter.java:157)
        at org.apache.phoenix.hbase.index.write.ParallelWriterIndexCommitte=
r$1.call(ParallelWriterIndexCommitter.java:134)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExec=
utor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExe=
cutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
{code}

The problem is caused by the ParallelWriterIndexCommitter.write method, in =
following line 151,if allowLocalUpdates  is true, it would wiite index muta=
tions to local region unconditionlly=EF=BC=8Cwhich is obviously inappropria=
te.

{code:java}
 150            try {
 151                      if (allowLocalUpdates && env !=3D null) {
 152                           try {
 153                               throwFailureIfDone();
 154                               IndexUtil.writeLocalUpdates(env.getRegio=
n(), mutations, true);
 155                               return null;
 156                           } catch (IOException ignord) {
 157                               // when it's failed we fall back to the =
standard & slow way
 158                               if (LOG.isDebugEnabled()) {
 159                                   LOG.debug("indexRegion.batchMutate f=
ailed and fall back to HTable.batch(). Got error=3D"
 160                                           + ignord);
 161                               }
 162                           }
 163                       }
{code}

When a data table has a global index table , and when we replay the WALs to=
 index table in Indexer.postOpen method in following=20
line 691, which the {{allowLocalUpdates}} parameter is true, the  {{updates=
}} parameter for the index table would  incorrectly be written to the cuure=
nt data table region:

{code:java}
688        // do the usual writer stuff, killing the server again, if we ca=
n't manage to make the index
689        // writes succeed again
690        try {
691            writer.writeAndKillYourselfOnFailure(updates, true);
692        } catch (IOException e) {
693                LOG.error("During WAL replay of outstanding index update=
s, "
694                        + "Exception is thrown instead of killing server=
 during index writing", e);
695        }
696    } finally {
{code}

However , ParallelWriterIndexCommitter.write method in the master and other=
 4.x branches is correct,just as following line 150 and line 151 :
{code:java}
147                   try {
148                        if (allowLocalUpdates
149                                && env !=3D null
150                                && tableReference.getTableName().equals(
151                                    env.getRegion().getTableDesc().getNa=
meAsString())) {
152                            try {
153                                throwFailureIfDone();
154                                IndexUtil.writeLocalUpdates(env.getRegio=
n(), mutations, true);
155                                return null;
156                            } catch (IOException ignord) {
157                                // when it's failed we fall back to the =
standard & slow way
158                                if (LOG.isDebugEnabled()) {
159                                    LOG.debug("indexRegion.batchMutate f=
ailed and fall back to HTable.batch(). Got error=3D"
160                                            + ignord);
161                                }
162                            }
163                        }
{code}

This inconsistency is introduced by PHOENIX-1734 and PHOENIX-3018, because =
lack of unit tests or IT tests for Indexer.preWALRestore /postOpen,  the  i=
nconsistency is not detected.

BTW=EF=BC=8Cthe TrackingParallelWriterIndexCommitter  is right for master a=
nd all the 4.x branches.

  was:
I used phoenix-4.x-HBase-0.98 in my hbase cluster.When I When I restarted m=
y hbase cluster a certain time, I noticed some  RegionServers have a lot of=
 error logs as following:

{code:java}
     2017-08-01 11:53:10,669 WARN  [rsync.slave005.bizhbasetest.sjs.ted,600=
20,1501511894174-index-writer--pool2-t786] regionserver.HRegion: Failed get=
ting lock in batch put, row=3D\x10\x00\x00\x00913f0eed-6710-4de9-8bac-077a1=
06bb9ae_0
org.apache.hadoop.hbase.regionserver.WrongRegionException: Requested row ou=
t of range for row lock on HRegion BIZARCH_NS_PRODUCT.BIZTRACER_SPAN,90ffd7=
83-b0a3-4f8a-81ef-0a7535fea197_0,1490066612493.463220cd8fad7254481595911e62=
d74d., startKey=3D'90ffd783-b0a3-4f8a-81ef-0a7535fea197_0', getEndKey()=3D'=
917fc343-3331-47fa-907c-df83a6f302f7_0', row=3D'\x10\x00\x00\x00913f0eed-67=
10-4de9-8bac-077a106bb9ae_0'
        at org.apache.hadoop.hbase.regionserver.HRegion.checkRow(HRegion.ja=
va:3539)
        at org.apache.hadoop.hbase.regionserver.HRegion.getRowLock(HRegion.=
java:3557)
        at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation=
(HRegion.java:2394)
        at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion=
.java:2261)
        at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion=
.java:2213)
        at org.apache.phoenix.util.IndexUtil.writeLocalUpdates(IndexUtil.ja=
va:671)
        at org.apache.phoenix.hbase.index.write.ParallelWriterIndexCommitte=
r$1.call(ParallelWriterIndexCommitter.java:157)
        at org.apache.phoenix.hbase.index.write.ParallelWriterIndexCommitte=
r$1.call(ParallelWriterIndexCommitter.java:134)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExec=
utor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExe=
cutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
{code}

The problem is caused by the ParallelWriterIndexCommitter.write method, in =
following line 151,if allowLocalUpdates  is true, it would wiite index muta=
tions to local region unconditionlly=EF=BC=8Cwhich is obviously inappropria=
te.

{code:java}
 150            try {
 151                      if (allowLocalUpdates && env !=3D null) {
 152                           try {
 153                               throwFailureIfDone();
 154                               IndexUtil.writeLocalUpdates(env.getRegio=
n(), mutations, true);
 155                               return null;
 156                           } catch (IOException ignord) {
 157                               // when it's failed we fall back to the =
standard & slow way
 158                               if (LOG.isDebugEnabled()) {
 159                                   LOG.debug("indexRegion.batchMutate f=
ailed and fall back to HTable.batch(). Got error=3D"
 160                                           + ignord);
 161                               }
 162                           }
 163                       }
{code}

When a data table has a global index table , and when we replay the WALs to=
 index table in Indexer.postOpen method in following=20
line 691, which the {{allowLocalUpdates}} parameter is true, the  {{updates=
}} parameter for the index table would  incorrectly be written to the cuure=
nt data table region:

{code:java}
688        // do the usual writer stuff, killing the server again, if we ca=
n't manage to make the index
689        // writes succeed again
690        try {
691            writer.writeAndKillYourselfOnFailure(updates, true);
692        } catch (IOException e) {
693                LOG.error("During WAL replay of outstanding index update=
s, "
694                        + "Exception is thrown instead of killing server=
 during index writing", e);
695        }
696    } finally {
{code}

However , the master and other 4.x branches is correct:


> ParallelWriterIndexCommitter incorrectly applys local updates to index ta=
bles for 4.x-HBase-0.98
> -------------------------------------------------------------------------=
-----------------------
>
>                 Key: PHOENIX-4094
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4094
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.11.0
>            Reporter: chenglei
>
> I used phoenix-4.x-HBase-0.98 in my hbase cluster.When I When I restarted=
 my hbase cluster a certain time, I noticed some  RegionServers have a lot =
of error logs as following:
> {code:java}
>      2017-08-01 11:53:10,669 WARN  [rsync.slave005.bizhbasetest.sjs.ted,6=
0020,1501511894174-index-writer--pool2-t786] regionserver.HRegion: Failed g=
etting lock in batch put, row=3D\x10\x00\x00\x00913f0eed-6710-4de9-8bac-077=
a106bb9ae_0
> org.apache.hadoop.hbase.regionserver.WrongRegionException: Requested row =
out of range for row lock on HRegion BIZARCH_NS_PRODUCT.BIZTRACER_SPAN,90ff=
d783-b0a3-4f8a-81ef-0a7535fea197_0,1490066612493.463220cd8fad7254481595911e=
62d74d., startKey=3D'90ffd783-b0a3-4f8a-81ef-0a7535fea197_0', getEndKey()=
=3D'917fc343-3331-47fa-907c-df83a6f302f7_0', row=3D'\x10\x00\x00\x00913f0ee=
d-6710-4de9-8bac-077a106bb9ae_0'
>         at org.apache.hadoop.hbase.regionserver.HRegion.checkRow(HRegion.=
java:3539)
>         at org.apache.hadoop.hbase.regionserver.HRegion.getRowLock(HRegio=
n.java:3557)
>         at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutati=
on(HRegion.java:2394)
>         at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegi=
on.java:2261)
>         at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegi=
on.java:2213)
>         at org.apache.phoenix.util.IndexUtil.writeLocalUpdates(IndexUtil.=
java:671)
>         at org.apache.phoenix.hbase.index.write.ParallelWriterIndexCommit=
ter$1.call(ParallelWriterIndexCommitter.java:157)
>         at org.apache.phoenix.hbase.index.write.ParallelWriterIndexCommit=
ter$1.call(ParallelWriterIndexCommitter.java:134)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolEx=
ecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolE=
xecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> {code}
> The problem is caused by the ParallelWriterIndexCommitter.write method, i=
n following line 151,if allowLocalUpdates  is true, it would wiite index mu=
tations to local region unconditionlly=EF=BC=8Cwhich is obviously inappropr=
iate.
> {code:java}
>  150            try {
>  151                      if (allowLocalUpdates && env !=3D null) {
>  152                           try {
>  153                               throwFailureIfDone();
>  154                               IndexUtil.writeLocalUpdates(env.getReg=
ion(), mutations, true);
>  155                               return null;
>  156                           } catch (IOException ignord) {
>  157                               // when it's failed we fall back to th=
e standard & slow way
>  158                               if (LOG.isDebugEnabled()) {
>  159                                   LOG.debug("indexRegion.batchMutate=
 failed and fall back to HTable.batch(). Got error=3D"
>  160                                           + ignord);
>  161                               }
>  162                           }
>  163                       }
> {code}
> When a data table has a global index table , and when we replay the WALs =
to index table in Indexer.postOpen method in following=20
> line 691, which the {{allowLocalUpdates}} parameter is true, the  {{updat=
es}} parameter for the index table would  incorrectly be written to the cuu=
rent data table region:
> {code:java}
> 688        // do the usual writer stuff, killing the server again, if we =
can't manage to make the index
> 689        // writes succeed again
> 690        try {
> 691            writer.writeAndKillYourselfOnFailure(updates, true);
> 692        } catch (IOException e) {
> 693                LOG.error("During WAL replay of outstanding index upda=
tes, "
> 694                        + "Exception is thrown instead of killing serv=
er during index writing", e);
> 695        }
> 696    } finally {
> {code}
> However , ParallelWriterIndexCommitter.write method in the master and oth=
er 4.x branches is correct,just as following line 150 and line 151 :
> {code:java}
> 147                   try {
> 148                        if (allowLocalUpdates
> 149                                && env !=3D null
> 150                                && tableReference.getTableName().equal=
s(
> 151                                    env.getRegion().getTableDesc().get=
NameAsString())) {
> 152                            try {
> 153                                throwFailureIfDone();
> 154                                IndexUtil.writeLocalUpdates(env.getReg=
ion(), mutations, true);
> 155                                return null;
> 156                            } catch (IOException ignord) {
> 157                                // when it's failed we fall back to th=
e standard & slow way
> 158                                if (LOG.isDebugEnabled()) {
> 159                                    LOG.debug("indexRegion.batchMutate=
 failed and fall back to HTable.batch(). Got error=3D"
> 160                                            + ignord);
> 161                                }
> 162                            }
> 163                        }
> {code}
> This inconsistency is introduced by PHOENIX-1734 and PHOENIX-3018, becaus=
e lack of unit tests or IT tests for Indexer.preWALRestore /postOpen,  the =
 inconsistency is not detected.
> BTW=EF=BC=8Cthe TrackingParallelWriterIndexCommitter  is right for master=
 and all the 4.x branches.


--
This message was sent by Atlassian JIRA
(v6.4.14#64029)