phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Taylor (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (PHOENIX-1870) Fix NPE occurring during regex processing when joni library not used
Date Sat, 18 Apr 2015 21:59:58 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14501533#comment-14501533
] 

James Taylor edited comment on PHOENIX-1870 at 4/18/15 9:59 PM:
----------------------------------------------------------------

Good questions, [~shuxi0ng].
bq. 1. What tuple is used for?
Tuple is used to store the state of the current row being evaluated. It's mainly used at evaluation
time when an expression is a reference to a column. If an expression.isStateless() is true,
that means it doesn't need to contact the server for any information. If an expression.isDetermism()
== Determinism.ALWAYS, this means it'll give the same output each time with the same input.
Counterexamples are RAND(), CURRENT_DATE(), and NEXT VALUE FOR my_sequence. So if expression.isStateless()
is true and expression.isDetermism() is Determinism.ALWAYS, then we don't need a Tuple and
can just use null instead. It'd actually be nice if we could remove Tuple completely from
the evaluate method, but that's a bigger change for another day (PHOENIX-1887).

bq. 2. In some functions, for example RegexpReplaceFunction, there are two arguments, source
strings and replace strings. How should the replace interface be designed?
Maybe the most consistent way would be to pass a byte[] buf, int offset, int length triple
for each argument. You can then use the ImmutableBytesWritable passed in to evaluate to evaluate
each one, assigning local variables for the above arguments before evaluating the next one.

bq. 2.1 replace(srcPtr, replacePtr): RegexpReplaceFunction has a class member replacePtr,
and if is used in evaluation, and will be set to empty bytes after evaluation.
Oh, I missed that. That's potentially problematic, as these expressions are potentially used
by multiple threads at the same time. I'd recommend changing that as I've described above.

bq. 2.2 replace(ptr, srcExpression, replaceExpression): Pass the expression to Regex Engine,
and srcStr and replaceStr use the same ptr. Which one is better?
See response for (2), as that's probably the most consistent (even though it adds extra args).

bq. 3. for a function, REGEXP_REPLACE(PatternStr, SrcStr, ReplaceStr), if some of the three
arguments is Determinism.ALWAYS, so they can be pre-compute in init(). Why not try to cache
the value?
Yes, correct. If childExpression.isStateless() and Determinism.ALWAYS, then it can be cached.
FYI, we have a check above this in ExpressionCompiler that'll evaluate a function at compile
time if *all* of it's arguments are constant.

bq. Fox example, fun1( arg1, fun2(arg2)), if fun2(arg2) is Determinism.ALWAYS, and in every
evaluation of fun1, we don't have to compute fun2 again, because we can pre-compute in fun1.init().
Yes, correct. In this case, fun2(arg2) will come in already pre-computed.


was (Author: jamestaylor):
Good questions, [~shuxi0ng].
bq. 1. What tuple is used for?
Tuple is used to store the state of the current row being evaluated. It's mainly used at evaluation
time when an expression is a reference to a column. If an expression.isStateless() is true,
that means it doesn't need to contact the server for any information. If an expression.isDetermism()
== Determinism.ALWAYS, this means it'll give the same output each time with the same input.
Counterexamples are RAND(), CURRENT_DATE(), and NEXT VALUE FOR my_sequence. So if expression.isStateless()
is true and expression.isDetermism() is Determinism.ALWAYS, then we don't need a Tuple and
can just use null instead. It'd actually be nice if we could remove Tuple completely from
the evaluate method, but that's a bigger change for another day.

bq. 2. In some functions, for example RegexpReplaceFunction, there are two arguments, source
strings and replace strings. How should the replace interface be designed?
Maybe the most consistent way would be to pass a byte[] buf, int offset, int length triple
for each argument. You can then use the ImmutableBytesWritable passed in to evaluate to evaluate
each one, assigning local variables for the above arguments before evaluating the next one.

bq. 2.1 replace(srcPtr, replacePtr): RegexpReplaceFunction has a class member replacePtr,
and if is used in evaluation, and will be set to empty bytes after evaluation.
Oh, I missed that. That's potentially problematic, as these expressions are potentially used
by multiple threads at the same time. I'd recommend changing that as I've described above.

bq. 2.2 replace(ptr, srcExpression, replaceExpression): Pass the expression to Regex Engine,
and srcStr and replaceStr use the same ptr. Which one is better?
See response for (2), as that's probably the most consistent (even though it adds extra args).

bq. 3. for a function, REGEXP_REPLACE(PatternStr, SrcStr, ReplaceStr), if some of the three
arguments is Determinism.ALWAYS, so they can be pre-compute in init(). Why not try to cache
the value?
Yes, correct. If childExpression.isStateless() and Determinism.ALWAYS, then it can be cached.
FYI, we have a check above this in ExpressionCompiler that'll evaluate a function at compile
time if *all* of it's arguments are constant.

bq. Fox example, fun1( arg1, fun2(arg2)), if fun2(arg2) is Determinism.ALWAYS, and in every
evaluation of fun1, we don't have to compute fun2 again, because we can pre-compute in fun1.init().
Yes, correct. In this case, fun2(arg2) will come in already pre-computed.

> Fix NPE occurring during regex processing when joni library not used
> --------------------------------------------------------------------
>
>                 Key: PHOENIX-1870
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1870
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: James Taylor
>            Assignee: Shuxiong Ye
>             Fix For: 5.0.0, 4.4.0
>
>         Attachments: 0001-PHOENIX-1870-Fix-NPE-occurring-during-regex-processi.patch,
PHOENIX-1870-Regex-Expression-refinement.patch, PHOENIX-1870_v2.patch
>
>
> Tests in error:
> {code}
>   IndexExpressionIT.testImmutableCaseSensitiveFunctionIndex:1277->helpTestCaseSensitiveFunctionIndex:1321
» Commit
>   IndexExpressionIT.testImmutableLocalCaseSensitiveFunctionIndex:1282->helpTestCaseSensitiveFunctionIndex:1321
» Commit
> {code}
> Stack trace:
> {code}
> testImmutableCaseSensitiveFunctionIndex(org.apache.phoenix.end2end.index.IndexExpressionIT)
 Time elapsed: 0.723 sec  <<< ERROR!
> org.apache.phoenix.execute.CommitException: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
Failed 2 actions: org.apache.phoenix.hbase.index.builder.IndexBuildingFailureException: Failed
to build index for unexpected reason!
>         at org.apache.phoenix.hbase.index.util.IndexManagementUtil.rethrowIndexingException(IndexManagementUtil.java:180)
>         at org.apache.phoenix.hbase.index.Indexer.preBatchMutate(Indexer.java:206)
>         at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$35.call(RegionCoprocessorHost.java:989)
>         at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1671)
>         at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1746)
>         at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1703)
>         at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preBatchMutate(RegionCoprocessorHost.java:985)
>         at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:2697)
>         at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2478)
>         at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2432)
>         at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2436)
>         at org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:641)
>         at org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:605)
>         at org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1822)
>         at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2031)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
>         at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
>         at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
>         at java.lang.Thread.run(Thread.java:724)
> Caused by: java.lang.NullPointerException
>         at org.apache.phoenix.expression.util.regex.JavaPattern.substr(JavaPattern.java:83)
>         at org.apache.phoenix.expression.function.RegexpSubstrFunction.evaluate(RegexpSubstrFunction.java:118)
>         at org.apache.phoenix.index.IndexMaintainer.buildRowKey(IndexMaintainer.java:453)
>         at org.apache.phoenix.index.IndexMaintainer.buildDeleteMutation(IndexMaintainer.java:842)
>         at org.apache.phoenix.index.PhoenixIndexCodec.getIndexUpdates(PhoenixIndexCodec.java:173)
>         at org.apache.phoenix.index.PhoenixIndexCodec.getIndexDeletes(PhoenixIndexCodec.java:119)
>         at org.apache.phoenix.hbase.index.covered.CoveredColumnsIndexBuilder.addDeleteUpdatesToMap(CoveredColumnsIndexBuilder.java:403)
>         at org.apache.phoenix.hbase.index.covered.CoveredColumnsIndexBuilder.addCleanupForCurrentBatch(CoveredColumnsIndexBuilder.java:287)
>         at org.apache.phoenix.hbase.index.covered.CoveredColumnsIndexBuilder.addMutationsForBatch(CoveredColumnsIndexBuilder.java:239)
>         at org.apache.phoenix.hbase.index.covered.CoveredColumnsIndexBuilder.batchMutationAndAddUpdates(CoveredColumnsIndexBuilder.java:136)
>         at org.apache.phoenix.hbase.index.covered.CoveredColumnsIndexBuilder.getIndexUpdate(CoveredColumnsIndexBuilder.java:99)
>         at org.apache.phoenix.hbase.index.builder.IndexBuildManager$1.call(IndexBuildManager.java:133)
>         at org.apache.phoenix.hbase.index.builder.IndexBuildManager$1.call(IndexBuildManager.java:129)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         ... 1 more
> : 2 times,
>         at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:227)
>         at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:207)
>         at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.getErrors(AsyncProcess.java:1563)
>         at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:928)
>         at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:942)
>         at org.apache.phoenix.execute.MutationState.commit(MutationState.java:422)
>         at org.apache.phoenix.jdbc.PhoenixConnection$3.call(PhoenixConnection.java:439)
>         at org.apache.phoenix.jdbc.PhoenixConnection$3.call(PhoenixConnection.java:436)
>         at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
>         at org.apache.phoenix.jdbc.PhoenixConnection.commit(PhoenixConnection.java:436)
>         at org.apache.phoenix.end2end.index.IndexExpressionIT.helpTestCaseSensitiveFunctionIndex(IndexExpressionIT.java:1321)
>         at org.apache.phoenix.end2end.index.IndexExpressionIT.testImmutableCaseSensitiveFunctionIndex(IndexExpressionIT.java:1277)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message