hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ladda, Anand" <>
Subject Block Sampling Impact
Date Fri, 15 Jun 2012 21:17:32 GMT
I was trying block sampling on a 6 million (~400MB sized table) and can see if I sample about
1 percent of the data I get about 3x faster response on the queries (I can also see difference
in the data returned). The input format though is 'org.apache.hadoop.mapred.TextInputFormat'
and not CombineHiveInputFormat as mentioned in the Block Sampling documentation. Question
for the experts on whether block sampling is expected to work with other input formats as

hive> desc formatted orderdetail2;
# col_name              data_type               comment

order_id                int                     None
item_id                 int                     None
order_date              string                  None
emp_id                  int                     None
promotion_id            int                     None
qty_sold                float                   None
unit_price              float                   None
unit_cost               float                   None
discount                float                   None
customer_id             int                     None

# Detailed Table Information
Database:               default
Owner:                  hdfs
CreateTime:             Fri Jun 15 16:51:44 EDT 2012
LastAccessTime:         UNKNOWN
Protect Mode:           None
Retention:              0
Location:               --
Table Type:             MANAGED_TABLE
Table Parameters:
        transient_lastDdlTime   1339793622

# Storage Information
SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat:            org.apache.hadoop.mapred.TextInputFormat
Compressed:             No
Num Buckets:            -1
Bucket Columns:         []
Sort Columns:           []
Storage Desc Params:
        serialization.format    1
Time taken: 0.124 seconds

View raw message