hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Re: ORC file sort order ..
Date Sat, 09 Apr 2016 16:41:58 GMT
Have you tried bucketing by the column plus setting orce,create.index and
orc.bloom.filter.columns

CREATE TABLE dummy (
     ID INT
   , CLUSTERED INT
   , SCATTERED INT
   , RANDOMISED INT
   , RANDOM_STRING VARCHAR(50)
   , SMALL_VC VARCHAR(10)
   , PADDING  VARCHAR(10)
)

*CLUSTERED BY (ID) INTO 256 BUCKETS*STORED AS ORC
TBLPROPERTIES (


*"orc.create.index"="true","orc.bloom.filter.columns"="ID","*
orc.bloom.filter.fpp"="0.05",
"orc.compress"="SNAPPY",
"orc.stripe.size"="16777216",
"orc.row.index.stride"="10000" )
;


HTH


Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 9 April 2016 at 01:53, Gautam <gautamkowshik@gmail.com> wrote:

> Hey,
>
>            This might be too obvious a question but I haven't found a way
> to validate ordering in an ORC file. I need each file to be ordered by a
> column, Is there a sure shot way of ensuring the sort order in an ORC file
> is as I expect it?
>
> The closest i'v come to is using the hive --orcfiledump --rowindex
> <col_id> which prints that columns min/max values in the index. But that is
> still not saying if the data within the stripes is sorted.
>
> Cheers,
> -Gautam.
>

Mime
View raw message