flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "sunjincheng (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (FLINK-6097) Guaranteed the order of the extracted field references
Date Sat, 18 Mar 2017 01:01:46 GMT

     [ https://issues.apache.org/jira/browse/FLINK-6097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

sunjincheng updated FLINK-6097:
-------------------------------
    Description: 
When we try to implement `OVER window` TableAPI, The first version of the prototype to achieve,we
do not consider the table field will be out of order when we implement `translateToPlan` method,then
we  set `outputRow` field from `inputRow` according to the Initial order of the table field
index.
At the beginning, the projections in the select statement less than 5 columns It works well.But
Unfortunately when the count of projections bigger than 4 (>=5), we got the random result.
Then we debug the code, we find that  `ProjectionTranslator # identifyFieldReferences` method
uses the` Set` temporary save field, when the number of elements in the Set is less than 5,
the Set takes the Se1, Se2, Se3, Se4 data structures. When the number of elements is greater
than or equal to 5, the Set takes HashSet # HashTrieSet and which will cause the data to be
out of order.  
e.g.:
Add the following elements in turn:
{code}
A, b, c, d, e
Set (a)
Class scala.collection.immutable.Set $ Set1
Set (a, b)
Class scala.collection.immutable.Set $ Set2
Set (a, b, c)
Class scala.collection.immutable.Set $ Set3
Set (a, b, c, d)
Class scala.collection.immutable.Set $ Set4
// we want (a, b, c, d, e)
Set (e, a, b, c, d) 
Class scala.collection.immutable.HashSet $ HashTrieSet
{code}

So we thought 2 approach to solve this problem:

1. Let `ProjectionTranslator # identifyFieldReferences` method guaranteed the order of the
extracted field references same as input order.
2. We add the input and output field mapping. 

At last we using approach#2 solve the problem. This change is not necessary for the problem
i have faced. But I feel it is better to let the output of this method in the same order as
the input, it may be very helpful for other cases, though I am currently not aware of any.
I am ok with not making this change, but we should add a comment instead to highlight that
the potential output of the current output. Otherwise, some people may not pay attention to
this and assume it is in order, like me.
Hi, guys, What do you think? Welcome any feedback.


  was:
The current `ProjectionTranslator # identifyFieldReferences` method uses the` Set` temporary
save field, when the number of elements in the Set is less than 5, the Set takes the Se1,
Se2, Se3, Se4 data structures. When the number of elements is greater than or equal to 5,
the Set takes HashSet # HashTrieSet and which will cause the data to be out of order. although
the out of order is also working, but I think the order is better than out of order. So I
want to improve it,Orderly extraction field.i.e.Guaranteed the order of the extracted field
references as input order.
e.g.:
Add the following elements in turn:
{code}
A, b, c, d, e
Set (a)
Class scala.collection.immutable.Set $ Set1
Set (a, b)
Class scala.collection.immutable.Set $ Set2
Set (a, b, c)
Class scala.collection.immutable.Set $ Set3
Set (a, b, c, d)
Class scala.collection.immutable.Set $ Set4

Set (e, a, b, c, d) -> I want (a, b, c, d, e)
Class scala.collection.immutable.HashSet $ HashTrieSet
{code}


> Guaranteed the order of the extracted field references
> ------------------------------------------------------
>
>                 Key: FLINK-6097
>                 URL: https://issues.apache.org/jira/browse/FLINK-6097
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table API & SQL
>            Reporter: sunjincheng
>            Assignee: sunjincheng
>
> When we try to implement `OVER window` TableAPI, The first version of the prototype to
achieve,we do not consider the table field will be out of order when we implement `translateToPlan`
method,then we  set `outputRow` field from `inputRow` according to the Initial order of
the table field index.
> At the beginning, the projections in the select statement less than 5 columns It works
well.But Unfortunately when the count of projections bigger than 4 (>=5), we got the random
result. Then we debug the code, we find that  `ProjectionTranslator # identifyFieldReferences`
method uses the` Set` temporary save field, when the number of elements in the Set is less
than 5, the Set takes the Se1, Se2, Se3, Se4 data structures. When the number of elements
is greater than or equal to 5, the Set takes HashSet # HashTrieSet and which will cause the
data to be out of order.  
> e.g.:
> Add the following elements in turn:
> {code}
> A, b, c, d, e
> Set (a)
> Class scala.collection.immutable.Set $ Set1
> Set (a, b)
> Class scala.collection.immutable.Set $ Set2
> Set (a, b, c)
> Class scala.collection.immutable.Set $ Set3
> Set (a, b, c, d)
> Class scala.collection.immutable.Set $ Set4
> // we want (a, b, c, d, e)
> Set (e, a, b, c, d) 
> Class scala.collection.immutable.HashSet $ HashTrieSet
> {code}
> So we thought 2 approach to solve this problem:
> 1. Let `ProjectionTranslator # identifyFieldReferences` method guaranteed the order of
the extracted field references same as input order.
> 2. We add the input and output field mapping. 
> At last we using approach#2 solve the problem. This change is not necessary for the problem
i have faced. But I feel it is better to let the output of this method in the same order as
the input, it may be very helpful for other cases, though I am currently not aware of any.
I am ok with not making this change, but we should add a comment instead to highlight that
the potential output of the current output. Otherwise, some people may not pay attention to
this and assume it is in order, like me.
> Hi, guys, What do you think? Welcome any feedback.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message