db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "msatoorPotter ." <msat...@gmail.com>
Subject Re: DERBY-6301 - In list predicate not getting moved down to store for a subset of in list queries
Date Wed, 12 Feb 2014 00:27:53 GMT
Thanks for conforming it, Dag. Continuing on this line, do you think
following is one possible way of getting the in list to the generated code
for BulkTableScanResultSet(although it is more of a special case code
rather than using the current mechanism of sending qualifiers to store)? We
can modify the code generation for BulkTableScanResultSet and send inlist
as an additional parameter during code generation(in
ProjectRestrictNode.generateMinion) in case of in list queries which do not
qualify for multi-probing and thus resulting in table scan.

Another solution would be to at compile time, for inlist queries that do
not qualify for multi-probing, convert the inlist into a sequence of =
operators for each of the element in the in list and add it to the
qualifiers list for BulkTableScanResultSet. But I am little hesitant about
suggesting changes on the compile side since I have seen especial case
handling for inlist in several places in code. One specific example would
be changes that went in as part of DERBY-3253(NullPointer Exception (NPE)
from query with IN predicate containing two values and joining a view with
a large table. ERROR 38000: The exception 'java.lang.NullPointerException'
was thrown while evaluating an expression.) The fix for DERBY-3253 has
following comment in BinaryRelationalOperatorNode.isQualifier(Optimizable,
boolean) line: 931
/* If this rel op is for an IN-list probe predicate then we never
* treat it as a qualifer. The reason is that if we treat it as
* a qualifier then we could end up generating it as a qualifier,
* which would lead to the generation of an equality qualifier
* of the form "col = <val>" (where <val> is the first value in
* the IN-list). That would lead to wrong results (missing rows)
* because that restriction is incorrect.
*/

So, is first solution too much a hack to send the inlist as a special
parameter during code generation to BulkTableScanResultSet or should this
problem of passing the qualifiers to store be approached with a different
idea? No matter how we decide to implement it, once we have a mechanism to
let store know of additional qualifiers for inlist, we will not need a
ProjectRestrictNode on top of BulkTableScanResultSet to rule out the rows
that don't qualify at language level because store would have already done
that based on the qualifiers.

thanks for your continued help,
Mamta


On Tue, Feb 11, 2014 at 2:02 PM, Dag H. Wanvik <dag.wanvik@oracle.com>wrote:

>  Yes, this looks about right to me. I believe your understanding is
> correct.
> Thanks for detailing it for others.
>
> Dag
>
>
> On 11. feb. 2014 08:57, msatoorPotter . wrote:
>
>  Dag, thanks for sharing the reverse scenario you had to deal with. Just
> want to write down my understanding of what is happening for my specific
> case.
>
> Spend some time in code generation and run time code to see what happens
> for the query below
> select * from t1 where c11  in
> (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30);
> when table has only 200 rows and there is a primary key defined on column
> c11.
> At code generation time, ProjectRestrictNode has in list predicate
> classified as start and stop predicate but since we found that bulk table
> scan is more appropriate for this query instead of index scan(because
> number of elements in the in list are 30 and table has only200 rows), we
> remove the predicate from the list of start/stop predicate and put it as
> restriction at ProjectRestrictNode level(the stack trace of that code
> sequence is as follows). This is where we loose the information about
> predicate between start/stop predicate which is right anyways because
> start/stop predicate only makes sense for index scan and not for table scan.
> Thread [main] (Suspended)
>  ProjectRestrictNode.generateMinion(ExpressionClassBuilder, MethodBuilder,
> boolean) line: 1365
>  ProjectRestrictNode.generate(ActivationClassBuilder, MethodBuilder) line:
> 1301
>  ProjectRestrictNode.generateMinion(ExpressionClassBuilder, MethodBuilder,
> boolean) line: 1348
>  ProjectRestrictNode.generate(ActivationClassBuilder, MethodBuilder) line:
> 1301
>  ScrollInsensitiveResultSetNode.generate(ActivationClassBuilder,
> MethodBuilder) line: 86
>  CursorNode.generate(ActivationClassBuilder, MethodBuilder) line: 628
>  CursorNode(StatementNode).generate(ByteArray) line: 317
>  GenericStatement.prepMinion(LanguageConnectionContext, boolean, Object[],
> SchemaDescriptor, boolean) line: 547
>  GenericStatement.prepare(LanguageConnectionContext, boolean) line: 99
>  GenericLanguageConnectionContext.prepareInternalStatement(SchemaDescriptor,
> String, boolean, boolean) line: 1116
>  EmbedStatement.execute(String, boolean, boolean, int, int[], String[])
> line: 682
>  EmbedStatement.execute(String) line: 631
>  ij.executeImmediate(String) line: 367
>  utilMain.doCatch(String) line: 527
>  utilMain.runScriptGuts() line: 372
>  utilMain.go(LocalizedInput[], LocalizedOutput) line: 245
>  Main.go(LocalizedInput, LocalizedOutput) line: 229
>  Main.mainCore(String[], Main) line: 184
>  Main.main(String[]) line: 75
>  ij.main(String[]) line: 59
> restriction at ProjectRestrictNode looks like following
> restriction BinaryRelationalOperatorNode  (id=1139)
>  beginOffset -1
>  betweenSelectivity false
>  btnVis BaseTableNumbersVisitor  (id=1708)
>  cm ContextManager  (id=1993)
>  constantActionFactory null
>  dataTypeServices DataTypeDescriptor  (id=1165)
>  endOffset -1
>  forQueryRewrite false
>  inListProbeSource InListOperatorNode  (id=1533)
>   beginOffset 22
>   cm ContextManager  (id=1993)
>   constantActionFactory null
>   dataTypeServices DataTypeDescriptor  (id=1861)
>   endOffset 113
>   isOrdered true
>   isPrivilegeCollectionRequired true
>   lcc GenericLanguageConnectionContext  (id=1194)
>   leftOperand ColumnReference  (id=1215)
>   methodName "in" (id=1122)
>   operator "IN" (id=1974)
>   rightOperandList ValueNodeList  (id=2443)
>   sortDescending false
>   transformed false
>   visitableTags null
>  isPrivilegeCollectionRequired true
>  kind 2
>  kind 0
>  lcc GenericLanguageConnectionContext  (id=1194)
>  leftInterfaceType "org.apache.derby.iapi.types.DataValueDescriptor"
> (id=1089)
>  leftOperand ColumnReference  (id=1215)
>  methodName "equals" (id=2102)
>  operator "=" (id=1349)
>  optBaseTables JBitSet  (id=1080)
>  receiver null
>  relOpType 1
>  resultInterfaceType null
>  rightInterfaceType "org.apache.derby.iapi.types.DataValueDescriptor"
> (id=1089)
>  rightOperand ParameterNode  (id=1783)
>  transformed false
>  valNodeBaseTables JBitSet  (id=1065)
>  visitableTags null
>  xmlQuery null
> We continue in ProjectRestrictNode.generateMinion and convert the
> restriction to a method call at runtime. This restriction method will be
> run on every row returned from the store since store does not know anything
> about restrictions since no start/stop/qualifier has been passed to bulk
> table scan calls to store. So at the end of code generation, we have
> generated two kinds of resultsets,
> 1)BulkTableScanResultSet with no qualifers/start/stop keys, so all the
> rows from the table will be returned by store and
> 2)ProjectRestrictResultSet which will use BulkTableScanResultSet generated
> earlier as the source. And for each row returned by BulkTableScanResultSet,
> it will apply the restriction method generated earlier which will go
> through the row returned and see if returns true for the inlist elements.
> So, by this point, we have lost all the information about inlist. It is
> only available indirectly as restriction method but there is no way to get
> to it at this point for the way the code is written. Following is the
> relevant code from ProjectRestrictResultSet.getNextRowCore
>     do
>     {
>  candidateRow = source.getNextRowCore(); //this is the call to
> BulkTableScanResultSet to get next row
>  if (candidateRow != null)
>  {
>   beginRT = getCurrentTimeMillis();
>   /* If restriction is null, then all rows qualify */
>   if (restriction == null)
>   {
>    restrict = true;
>   }
>   else
>   {
>    setCurrentRow(candidateRow);
>                 ////Following is the call to restriction method to see if
> row returned by
>                         //BulkTableScanResultSet qualifies or not
>           restrictBoolean = (DataValueDescriptor)
>                                    restriction.invoke(activation);
> The above blurb just explains what is happening at code generation and
> execution time for an in list which was identified by optimizer as not a
> good candidate for multi-probing and subsequently, table scan was found to
> be the most efficient plan for it.
>
>
> On Fri, Feb 7, 2014 at 3:04 AM, Dag H. Wanvik <dag.wanvik@oracle.com>wrote:
>
>> For what it's worth, I just had the opposite problem: I want the
>> ProjectRestrictNode to *not* push qualifiers down to store, and I found I
>> could avoid that by making sure this test in PRN was extended with my case,
>> ca line 707:
>>
>>         if ((restrictionList != null) && !alreadyPushed &&
>> !hashJoinWithThisPRN && !validatingDeferredCheckConstraints)
>>         {
>>             restrictionList.pushUsefulPredicates((Optimizable)
>> childResult);
>>         }
>>
>> so somehow you'd need to get the pushDown to happen in your case.
>>
>> Dag
>>
>>
>>
>> On 06. feb. 2014 07:51, Mamta Satoor wrote:
>>
>>> Hi,
>>> I have spent sometime looking through the code for DERBY-6301(SQL layer
>>> should push down IN list predicates to store when doing a scan). I have
>>> found that if the number of elements in the in list are fairly large,
>>> optimizer will find that table scan is better than using an available
>>> index. This is a good optimization since it is indeed better to use table
>>> scan for such an in list query. But the problem is when we talk to the
>>> store about doing table scan, we do not pass any qualifiers to the store
>>> based on the in list. This causes store to lock more rows than really
>>> necessary. As Mike noted down in DERBY-6301 "In addition to performance
>>> considerations this presents a locking problem with respect to the
>>> repeatable read isolation level. It is optimal in repeatable read to not
>>> maintain locks on those
>>> rows that do not qualify. Currently this locking optimization only takes
>>> place for those rows that are qualified in the store vs. those qualified in
>>> the upper SQL layer. So in the case of a non-multi-probe IN-LIST plan all
>>> non-qualified rows looked at as part of the execution will remain locked in
>>> repeatable read. "
>>> During my debugging, following is where I see that there are no
>>> qualifiers sent to BulkTableScanResultSet.
>>> BulkTableScanResultSet(TableScanResultSet).openCore() line: 246
>>>  BulkTableScanResultSet.openCore() line: 286
>>>  ProjectRestrictResultSet.openCore() line: 174
>>>  ProjectRestrictResultSet(BasicNoPutResultSetImpl).open() line: 266
>>>  GenericPreparedStatement.executeStmt(Activation, boolean, long) line:
>>> 439
>>>  GenericPreparedStatement.execute(Activation, long) line: 320
>>>  EmbedStatement.executeStatement(Activation, boolean, boolean) line: 1337
>>>  EmbedStatement.execute(String, boolean, boolean, int, int[], String[])
>>> line: 704
>>>  EmbedStatement.execute(String) line: 631
>>>  ij.executeImmediate(String) line: 367
>>>  utilMain.doCatch(String) line: 527
>>>  utilMain.runScriptGuts() line: 372
>>>  utilMain.go(LocalizedInput[], LocalizedOutput) line: 245
>>>  Main.go(LocalizedInput, LocalizedOutput) line: 229
>>>  Main.mainCore(String[], Main) line: 184
>>>  Main.main(String[]) line: 75
>>>  ij.main(String[]) line: 59
>>> I tried a simple query as shown below to verify that there indeed are
>>> other places in the code where we pass qualifiers when doing table scan and
>>> we need to mimic something similar for in list when table scan seems a
>>> better option for in list. An example query which does table scan and
>>> passes qualifier is as below
>>> select * from t2 where c21>=1 and c21<=30;
>>> Table t2 above has 4 columns with no index on any column. It has 200
>>> rows with values ranging from 1 to 200 in column c21. When i run this query
>>> in ij with log query plan, I see following query plan for it.
>>> Wed Feb 05 14:50:19 PST 2014 Thread[main,5,main] (XID = 1082),
>>> (SESSIONID = 1), select * from t2 where c21>=1 and c21<=30 ******* Table
>>> Scan ResultSet for T2 at read committed isolation level using instantaneous
>>> share row locking chosen by the optimizer
>>> Number of opens = 1
>>> Rows seen = 30
>>> Rows filtered = 0
>>> Fetch Size = 16
>>>  constructor time (milliseconds) = 0
>>>  open time (milliseconds) = 0
>>>  next time (milliseconds) = 0
>>>  close time (milliseconds) = 0
>>>  next time in milliseconds/row = 0
>>> scan information:
>>>  Bit set of columns fetched=All
>>>  Number of columns fetched=4
>>>  Number of pages visited=3
>>>  Number of rows qualified=30
>>>  Number of rows visited=200
>>>  Scan type=heap
>>>  start position:
>>>   null
>>>  stop position:
>>>   null
>>>  qualifiers:
>>>   Column[0][0] Id: 0
>>>   Operator: <
>>>   Ordered nulls: false
>>>   Unknown return value: true
>>>   Negate comparison result: true
>>>   Column[0][1] Id: 0
>>>   Operator: <=
>>>   Ordered nulls: false
>>>   Unknown return value: false
>>>   Negate comparison result: false
>>>  optimizer estimated row count: 25.16
>>>  optimizer estimated cost: 81.76
>>> I will start investigating how are the qualifiers sent for the above
>>> query using AND and see if I get the language layer to do the same for in
>>> list cases with table scan and an index on the column on which in list is
>>> being used. If anyone has any ideas on how to do this, I will highly
>>> appreciate that
>>> thanks,
>>> Mamta
>>>
>>
>>
>
>

Mime
View raw message