accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William Slacum <>
Subject Re: Optimizing Accumulo for read performance
Date Wed, 06 Nov 2013 14:58:32 GMT
When you say schema, do you mean key schema? If so, why are you repeating
the node id?

Locality groups would help if you have larger swaths of data you wanted to
group together and query discretely from other locality groups. For
instance, I've seen key schemas where "in" and "out" edges are grouped

At a system level, if you know some information about the distribution of
the row values (in this case, it looks like node id and edge id), you can
pre split the table by taking some samples out of that space. This would
distribute the tablets arounds, making queries using the batch scanner
faster by increasing the parallelism. This would also increase the number
of input splits generated by the input format if you wanted to do batch
processing on the entire graph.

On Wed, Nov 6, 2013 at 9:19 AM, Michael Orr <> wrote:

> Hello,
> I’m working on an application that needs fast read performance. I’ve been
> conducting some experiments starting with a single (pseudo-distributed)
> cluster with the intent of scaling out. However, prior to doing so, I
> wanted to get a good gauge for how fast a single tablet server can read.
> The application processes and stores graph data with the following schema:
> for nodes:
> N|NodeID                ID:NodeID       EIN:EdgeID
>  EOUT:EdgeID             .. lots of other attributes
> there can be multiple EIN and EOUT CFs for each node
> for edges
> E|EdgeID                ID:NodeID       VIN:VertexID
>  EOUT:VertexID   .. lots of other attributes
> Scans into the system can be for entire graph or a subset of nodes and
> edges. We generally pull navigational information first, then other
> attributes later if needed. I’ve spent some time looking into using
> locality groups but was curious if there are recommendations on backend
> properties that could be set to increase read time particularly if memory
> and space were not a concern.
> Thanks for your help!
> Mike

View raw message