orc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From wgtmac <...@git.apache.org>
Subject [GitHub] orc pull request #247: ORC-339. Reorganize the ORC file format specification...
Date Thu, 12 Apr 2018 22:23:03 GMT
Github user wgtmac commented on a diff in the pull request:

    https://github.com/apache/orc/pull/247#discussion_r181239251
  
    --- Diff: site/specification/ORCv2.md ---
    @@ -0,0 +1,1032 @@
    +---
    +layout: page
    +title: Evolving Draft for ORC Specification v2
    +---
    +
    +This specification is rapidly evolving and should only be used for
    +developers on the project.
    +
    +# TO DO items
    +
    +The list of things that we plan to change:
    +
    +* Create a decimal representation with fixed scale using rle.
    +* Create a better float/double encoding that splits mantissa and
    +  exponent.
    +* Create a dictionary encoding for float, double, and decimal.
    +* Create RLEv3:
    +   * 64 and 128 bit variants
    +   * Zero suppression
    +   * Evaluate the rle subformats
    +* Group stripe data into stripelets to enable Async IO for reads.
    +* Reorder stripe data into (stripe metadata, index, dictionary, data)
    +* Stop sorting dictionaries and record the sort order separately in the index.
    +* Remove use of RLEv1 and RLEv2.
    +* Remove non-utf8 bloom filter.
    +* Use numeric value for decimal bloom filter.
    --- End diff --
    
    We may also use numeric value for decimal column statistics


---

Mime
View raw message