accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Denis <>
Subject Accumulo Direct Reader
Date Wed, 17 Oct 2012 13:46:48 GMT

    I am thinking about creating a Direct Reader for Accumulo.

    A library which has API compatible with the Accumulo client but
reads .rf-files directly from HDFS, bypassing tservers.

    Motivation is:

    1. To have a possibility to quickly read stalled data when the
tserver is busy (with re-balancing, reading logs, etc) or just went
down and its tablets are not redistributed yet.

    2. If the table is read-only or can afford eventual consistency,
many readers can work in parallel with no bottleneck of tserver. Also,
the table's data becomes local on three (number of HDFS replicas)
servers instead of one.

    3. Distribution of data: analytics can download .rf-files (even to
a laptop) and run their software locally.

    Any suggestions ?


View raw message