arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julien Le Dem <jul...@dremio.com>
Subject Re: Arrow sync in 15 min
Date Wed, 31 May 2017 18:06:55 GMT
Next sync: 6/21 9:30am PT on google hangout


On Wed, May 31, 2017 at 11:06 AM, Julien Le Dem <julien@dremio.com> wrote:

> Notes:
>
> Attendees/agenda building
> Wes (TwoSigma):
>  - Rest API
>  - Roadmap
>  - communicate with community
> Uwe (Blue Yonder):
>  - git tag for versioning
> Julien (Dremio):
>  - Timestamp:
>  - REST API
>  - Roadmap
>
> Discussion:
>  - git tag for versioning
>     - development packages version names are based on latest tag in
> history from master + commit count since then.
>     - since the release tag is in a branch it goes from an older version
> and is misleading
>     - options:
>        - add a tag {release version}.post on the first commit after the
> release to get a better dev version string
>        - rebase master on top of the last release (0.4)
>     - we decided to rebase master (the only change is adding the commit
> that updates the version number in pom files)
>  - Timestamp in Arrow and Parquet:
>     - Both support "Timezone Naive” timestamps (aka “timestamp without
> timezone” in SQL)
>         - in Arrow when timezone field is missing in Timestamp type:
> https://github.com/apache/arrow/blob/5899800f53f3c3fffc0db95294c4f0
> eb0e556228/format/Schema.fbs#L117
>         - in Parquet (proposed PR) when isAdjustedToUTC is false:
> https://github.com/apache/parquet-format/pull/51/files#diff-
> 0f9d1b5347959e15259da7ba8f4b6252R242
>     - They also both support a “Timezone aware” timestamp (aka “timestamp
> with timezone” in SQL)
>         - in Arrow when the timezone field is present with the original
> timezone.
>         - in Parquet when isAdjustedToUTC is true
>     - So there is more information in Arrow and it requires this extra
> information since its absence means “timezone naive”
>     - conclusion:
>         - when writing to parquet we should use isAdjustedToUTC = false
> only if there is no knowledge of the timezone
>         - when reading from parquet we will populate timezone with UTC
> when isAdjustedToUTC == true (and leave it missing otherwise)
>  - REST API:
>    - review doc here: https://docs.google.com/document/d/1N4TP6zARRs2c4_h-
> 4WqCqIFVPQwmxOmXel1V3AxpGok/edit#
>  - Roadmap:
>     - todo: blog post to describe the direction of arrow
>     - among those:
>       - REST API and generalizing messaging
>        - C++ analytics library for interacting with ARROW memory. Tools
> for wrapping existing data structure (array of doubles)
>        - arrow for GPU
>        - Arrow ODBC interface: turbodbc
>        - Spark integration improvements: group UDFS etc
>
> On Wed, May 31, 2017 at 9:16 AM, Julien Le Dem <julien@dremio.com> wrote:
>
>> The arrow sync is at 9:30 am PT today on google hangout
>> https://hangouts.google.com/hangouts/_/dremio.com/arrow
>>
>> --
>> Julien
>>
>
>
>
> --
> Julien
>



-- 
Julien

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message