accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Orr <>
Subject Optimizing Accumulo for read performance
Date Wed, 06 Nov 2013 14:19:09 GMT

I’m working on an application that needs fast read performance. I’ve been
conducting some experiments starting with a single (pseudo-distributed)
cluster with the intent of scaling out. However, prior to doing so, I
wanted to get a good gauge for how fast a single tablet server can read.

The application processes and stores graph data with the following schema:

for nodes:
N|NodeID                ID:NodeID       EIN:EdgeID              EOUT:EdgeID
            .. lots of other attributes

there can be multiple EIN and EOUT CFs for each node

for edges
E|EdgeID                ID:NodeID       VIN:VertexID
 EOUT:VertexID   .. lots of other attributes

Scans into the system can be for entire graph or a subset of nodes and
edges. We generally pull navigational information first, then other
attributes later if needed. I’ve spent some time looking into using
locality groups but was curious if there are recommendations on backend
properties that could be set to increase read time particularly if memory
and space were not a concern.

Thanks for your help!


View raw message