spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From JoshRosen <...@git.apache.org>
Subject [GitHub] spark pull request: [SPARK-7199][SQL] Add date and timestamp suppo...
Date Fri, 08 May 2015 00:29:25 GMT
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5984#discussion_r29909028
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
---
    @@ -223,6 +227,16 @@ public void setString(int ordinal, String value) {
       }
     
       @Override
    +  public void setDate(int ordinal, Date value) {
    +    throw new UnsupportedOperationException();
    +  }
    +
    +  @Override
    +  public void setTimestamp(int ordinal, Timestamp value) {
    --- End diff --
    
    Although `Timestamp` stores data in the variable-length field, the width of that variable-length
data won't change once it's stored or updated, so I think we can actually support in-place
updates for timestamp fields.  One tricky case will be to handle cases where the timestamp
column is initially null then we choose to set a new value.  In order to be able to safely
hash rows by hashing their bytes, we need to ensure that identical rows are bitwise identical.
To achieve this, the current `setNull()` method zeros out the field in the fixed-width section.
 However, I don't think that we can safely do that for variable-length fields that support
updates (such as Timestamp) because then we'll no longer be able to determine where that field's
variable-length data can be found (since we'll be zeroing out the pointer).  If we could do
a type-specific `setNull` method, then we could change that to only update the null bitmap
and to zero out the variable-length data, but leave the fixed-
 length offset unchanged.
    
    @rxin, do you have any ideas for how we should handle this?  Refactoring the internal
row interfaces / codegen to call type-specific `setNull()` methods might be pretty hard. 
@marmbrus, is there any API contract constraining what types of columns `setNull()` will be
called for?  I'm wondering if we could do something like only call `setNull()` for primitives,
but call something `setTimestamp(null)` or `setString(null)` for the non-primitive types.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message