accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Newton <eric.new...@gmail.com>
Subject Re: Accumulo Direct Reader
Date Wed, 17 Oct 2012 14:57:05 GMT
See InputFormatBase#setScanOffline.

Clone a table, take it offline and then use it as your map/reduce
input format.  This will preserve a consistent view of the underlying
files, without going through the tablet servers.

-Eric

On Wed, Oct 17, 2012 at 9:46 AM, Denis <denis@camfex.cz> wrote:
>     Hi.
>
>     I am thinking about creating a Direct Reader for Accumulo.
>
>     A library which has API compatible with the Accumulo client but
> reads .rf-files directly from HDFS, bypassing tservers.
>
>     Motivation is:
>
>     1. To have a possibility to quickly read stalled data when the
> tserver is busy (with re-balancing, reading logs, etc) or just went
> down and its tablets are not redistributed yet.
>
>     2. If the table is read-only or can afford eventual consistency,
> many readers can work in parallel with no bottleneck of tserver. Also,
> the table's data becomes local on three (number of HDFS replicas)
> servers instead of one.
>
>     3. Distribution of data: analytics can download .rf-files (even to
> a laptop) and run their software locally.
>
>     Any suggestions ?
>
>     Thanks.

Mime
View raw message