spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kiszk <...@git.apache.org>
Subject [GitHub] spark issue #13680: [SPARK-15962][SQL] Introduce additonal implementation wi...
Date Thu, 23 Jun 2016 05:09:51 GMT
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/13680
  
    @cloud-fan thank you for your good comment. I also read [previous proposal](https://github.com/apache/spark/pull/12640#discussion_r61539393).
    I love to have only single format (or implementation). Since I thought that there are
some reasons to keep the old format, I introduced a new dense format.
    
    IMHO, a new unified format should have three properties.
    1. Remove indirect offset (for performance and footprint)
    2. Have capability of presence of nullbit (for generality)
    3. Quickly get information on existence of null value in an array (for performance, in
particular, primitive array)
    
    Based on them, how about this single format?
    ```
    [numElements] [all zero in null bits?] [null bits] [values] [variable length portion]
    ``` 
    If we want to reduce memory footprint in the case of primitive array, we can drop ```[null
bits]``` part if ```[all zero in null bits?]``` has a special value.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message