flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-3971) Aggregates handle null values incorrectly.
Date Fri, 10 Jun 2016 16:00:22 GMT

    [ https://issues.apache.org/jira/browse/FLINK-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15324684#comment-15324684
] 

ASF GitHub Bot commented on FLINK-3971:
---------------------------------------

Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2049#discussion_r66638155
  
    --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/runtime/aggregate/AvgAggregate.scala
---
    @@ -66,51 +66,63 @@ abstract class IntegralAvgAggregate[T] extends AvgAggregate[T] {
       def doPrepare(value: Any, partial: Row): Unit
     }
     
    -class ByteAvgAggregate extends IntegralAvgAggregate[Byte] {
    +class ByteAvgAggregate[T] extends IntegralAvgAggregate[T] {
    --- End diff --
    
    I think it would be cleaner to keep the `Byte` type parameter.
    How about we add an abstract method `def doEvaluate(buffer: Row): Any` to `IntegralAvgAggregate`.

    The subclasses of `IntegralAvgAggregate` implement `doEvaluate` like `ByteAvgAggregate`:
    
    ```
    override def doEvaluate(buffer: Row): Any = {
      val bufferSum = buffer.productElement(partialSumIndex).asInstanceOf[Long]
      val bufferCount = buffer.productElement(partialCountIndex).asInstanceOf[Long]
      if (bufferCount == 0L) {
        null
      } else {
        (bufferSum / bufferCount).toByte
      }
    }
    ```
    
    and return an `Any` which is casted to `T` by `IntegralAvgAggregate.evaluate()` as follows:
    
    
    ```
    override def evaluate(buffer: Row): T = {
      doEvaluate(buffer).asInstanceOf[T]
    }
    ```
    
    Same for the `FloatingAvgAggregate` and its subclasses.


> Aggregates handle null values incorrectly.
> ------------------------------------------
>
>                 Key: FLINK-3971
>                 URL: https://issues.apache.org/jira/browse/FLINK-3971
>             Project: Flink
>          Issue Type: Bug
>          Components: Table API
>    Affects Versions: 1.1.0
>            Reporter: Fabian Hueske
>            Assignee: GaoLun
>            Priority: Critical
>             Fix For: 1.1.0
>
>
> Table API and SQL aggregates are supposed to ignore null values, e.g., {{sum(1,2,null,4)}}
is supposed to return {{7}}. 
> There current implementation is correct if at least one valid value is present however,
is incorrect if only null values are aggregated. {{sum(null, null, null)}} should return {{null}}
instead of {{0}}
> Currently only the Count aggregate handles the case of null-values-only correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message