arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Samuel <a...@alexsamuel.net>
Subject Re: Timestamps with different precision / Timedeltas
Date Thu, 14 Jul 2016 15:18:18 GMT
Hi all,

May I suggest that instead of fixed-point decimals, you consider a more
general fixed-denominator rational representation, for times and other
purposes? Powers of ten are convenient for humans, but powers of two more
efficient. For some applications, the efficiency of bit operations over
divmod is more useful than an exact representation of integral nanoseconds.

std::chrono takes this approach. I'll also humbly point you at my own
date/time library, https://github.com/alexhsamuel/cron (incomplete but
basically working), which may provide ideas or useful code. It was intended
for precisely this sort of application.

Regards,
Alex


On Thu, Jul 14, 2016 at 10:27 AM Uwe Korn <uwelk@xhochy.com> wrote:

> I agree with that having a Decimal type for timestamps is a nice
> definition. Haying your time encoded as seconds or nanoseconds should be
> the same as having a scale of the respective amount. But I would rather
> avoid having a separate decimal physical type. Therefore I'd prefer the
> parquet approach where decimal is only a logical type and backed by
> either a bytearray, int32 or int64.
>
> Thus a more general timestamp could look like:
>
> * Decimals are logical types, physical types are the same as defined in
> Parquet [1]
> * Base unit for timestamps is seconds, you can get milliseconds and
> nanoseconds by using a different scale. .(Note that seconds and so on
> are all powers of ten, thus matching the specification of decimal scale
> really good).
> * Timestamp is just another logical type that is referring to Decimal
> (and optionally may have a timezone) and signalling that we have a Time
> and not just a "simple" decimal.
> * For a first iteration, I would assume no timezone or UTC but not
> include a metadata field. Once we're sure the implementation works, we
> can add metadata about it.
>
> Timedeltas could be addressed in a similar way, just without the need
> for a timezone.
>
> For my usages, I don't have the use-case for a larger than int64
> timestamp and would like to have it exactly as such in my computation,
> thus my preference for the Parquet way.
>
> Uwe
>
> [1]
>
> https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#decimal
>
> On 13.07.16 03:06, Julian Hyde wrote:
> > I'm talking about a fixed decimal type, not floating decimal. (Oracle
> > numbers are floating decimal. They have a few nice properties, but
> > they are variable width and can get quite large. I've seen one or two
> > systems that started with binary flo


> * Base unit for timestamps is seconds, you can get milliseconds and

nanoseconds by using a different scale. .(Note that seconds and so on

are all powers of ten, thus matching the specification of decimal scale

really good).

* Timestamp is just another logical type that is referring to Decimal

(and optionally may have a timezone) and signalling that we have a Tim

ating point numbers, which are
> > much worse for business computing, and then change to Java BigDecimal,
> > which gives the right answer but are horribly inefficient.)
> >
> > A fixed decimal type has virtually zero computational overhead. It
> > just has a piece of metadata saying something like "every value in
> > this field is multiplied by 1 million" and leaves it to the client
> > program to do that multiplying.
> >
> > My advice is to create a good fixed decimal type and lean on it heavily.
> >
> > Julian
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message