hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris K Wensel (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-1605) TableInputFormat should support 'limit'
Date Thu, 02 Jul 2009 18:31:47 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726607#action_12726607

Chris K Wensel commented on HBASE-1605:

Good questions.

In SQL, LIMIT returns the first N rows of the result set. and is typically used with OFFSET
to allow pagination.

In Cascading, the Limit Operation only allows each task to see N/M rows (accounting for remainders).
no notion of OFFSET as limit in this case is really used for unit/integration testing or sampling.

re HBase, you guys should choose a model that makes most sense for typical hbase consumer
applications. but allowing for an even load across many mappers, but orthogonally limiting
the total number of rows processed is what I'm after.

having this work with a Filter would also be very nice. i.e. give me the 1k rows that satisfy
this condition. but I guess if i want the first 1k rows that satisfy the filter, we might
be limited to a single region (and single mapper as I see the code now).

so maybe there are two modes. sample and result. sample returns 'random' N rows (top N/M from
regions). result turns ordered N rows (from a region by virtue).

anyways, just throwing that out there. current use case would be happy with either. though
'result' is probably the most useful coupled with HBASE-1172.

> TableInputFormat should support 'limit'
> ---------------------------------------
>                 Key: HBASE-1605
>                 URL: https://issues.apache.org/jira/browse/HBASE-1605
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Chris K Wensel
> Would be useful if TableInputFormat could be passed a 'limit' property value that limited
the total result set to the value of 'limit'.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message