hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Namit Jain (JIRA)" <>
Subject [jira] [Commented] (HIVE-2121) Input Sampling By Splits
Date Wed, 27 Apr 2011 17:52:03 GMT


Namit Jain commented on HIVE-2121:

A few other comments:

   The percentage of data read is currently a function 
   of the split size - we should mark the last split 
   specially, and only read the required data at runtime.
   I mean, if we have only 1 split, and we need to sample
   10% of the data, there should be a way to do so. Currently,
   it seems impossible.

> Input Sampling By Splits
> ------------------------
>                 Key: HIVE-2121
>                 URL:
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-2121.1.patch, HIVE-2121.2.patch, HIVE-2121.3.patch, HIVE-2121.4.patch
> We need a better input sampling to serve at least two purposes:
> 1. test their queries against a smaller data set
> 2. understand more about how the data look like without scanning the whole table.
> A simple function that gives a subset splits will help in those cases. It doesn't have
to be strict sampling.

This message is automatically generated by JIRA.
For more information on JIRA, see:

View raw message