hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Dimiduk (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5475) Allow importtsv and Import to work truly offline when using bulk import option
Date Wed, 23 Jan 2013 00:34:12 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560232#comment-13560232

Nick Dimiduk commented on HBASE-5475:

I have an application that does this. It depends on what is currently an [external library|https://github.com/ndimiduk/reservoirsampler]
implementing a reservoir sampler over the input data to produce the splits file. The code
is actually from one of the examples in Alex Holmes's book. I'd like to roll the functionality
into ImportTsv, but my application functions pretty differently than the current tool.
> Allow importtsv and Import to work truly offline when using bulk import option
> ------------------------------------------------------------------------------
>                 Key: HBASE-5475
>                 URL: https://issues.apache.org/jira/browse/HBASE-5475
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>            Reporter: Lars Hofhansl
>            Priority: Minor
> Currently importtsv (and now also Import with HBASE-5440) support using HFileOutputFormat
for later bulk loading.
> However, currently that cannot be without having access to the table we're going to import
to, because both importtsv and Import need to lookup the split points, and find the compression
> It would be nice if there would be an offline way to provide the split point and compression

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message