accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <>
Subject Re: Accumulo Direct Reader
Date Wed, 17 Oct 2012 15:13:54 GMT
On Wed, Oct 17, 2012 at 10:57 AM, Eric Newton <> wrote:
> See InputFormatBase#setScanOffline.

This uses o.a.a.c.client.impl.OfflineScanner.  OfflineScanner will
scan an offline table by going directly to the files.  It does the
exact same thing the tablet server does when reading a tablets files.
 I was thinking of making OfflineScanner available through Connector
somehow when adding setScanOffline to M/R code, but did not for some
reason.  If there is interest we could revisit this.

> Clone a table, take it offline and then use it as your map/reduce
> input format.  This will preserve a consistent view of the underlying
> files, without going through the tablet servers.
> -Eric
> On Wed, Oct 17, 2012 at 9:46 AM, Denis <> wrote:
>>     Hi.
>>     I am thinking about creating a Direct Reader for Accumulo.
>>     A library which has API compatible with the Accumulo client but
>> reads .rf-files directly from HDFS, bypassing tservers.
>>     Motivation is:
>>     1. To have a possibility to quickly read stalled data when the
>> tserver is busy (with re-balancing, reading logs, etc) or just went
>> down and its tablets are not redistributed yet.
>>     2. If the table is read-only or can afford eventual consistency,
>> many readers can work in parallel with no bottleneck of tserver. Also,
>> the table's data becomes local on three (number of HDFS replicas)
>> servers instead of one.
>>     3. Distribution of data: analytics can download .rf-files (even to
>> a laptop) and run their software locally.
>>     Any suggestions ?
>>     Thanks.

View raw message