accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suresh Prajapati <>
Subject Re: Accumulo Table Sacanning Taking Time!!!
Date Mon, 01 May 2017 08:18:07 GMT
Hello Marc

Thanks for pointing out the area of problems. I tried changing
*but didn't find any changes in performance.
I am trying to fetch matching records count for specified query by using
AccumuloDatastore(ds) stats. Here is my sample code:

public int getRideCount(Long rideId) throws Exception {

    if(rideId != null){

         return ((Long) (ds.stats().getCount(sft, CQL.toFilter("r=" + rideId),


    return 0;


I also tried using Iterator but this is even worst. Below is the sample

public int getRideCount(Long rideId) throws Exception {

   int count = 0;

    if(rideId != null){

      Query q = new Query(tableName, CQL.toFilter("r=" + rideId));

      SimpleFeatureIterator it = sfs.getFeatures(q).features();






    return count;


For highlighting the *key structure*, here is my feature type description :


Please feel free to ask for any further clarifications.

Thank You

Suresh Prajapati

On Thu, Apr 27, 2017 at 7:05 PM, Marc P. <> wrote:

> Suresh,
>    There are a lot of configuration points that can have an impact. For
> example, there is a configuration option that dictates how much data is
> returned each "iteration," called table.scan.max.memory [0]. Increasing
> this will cause more work to be done in each RPC call to get data. Lowering
> this can have the illusion of improved response time since you get data
> faster. Playing with this might impact your use case. If your keys/values
> are large you might attempt to increase this configuration number.
> Further, scanning can be impacted by the size of the data and the way it is
> stored. Table block caching might have an improvement [1], but I'm curious
> about how the data is stored. Do you have example keys. Are you returning
> all 1 million records from Accumulo through the scanner to perform some
> logic client side or is the logic server side in an iterator? Could you do
> more work in an iterator? Iterating over 1 M keys likely won't take 2-3
> seconds when executed at the tablet server, depending on the size of the
> key. Providing some insight into what the key structure is might give us
> more insight into how to better configure your tablet server properties.
>    Finally, is the 2-3 seconds just the time to get the data or does that
> include time to inspect keys?
> [0]
> [1]
> On Thu, Apr 27, 2017 at 7:09 AM, Suresh Prajapati <
> > wrote:
> > Hello Team
> >
> > I am developing a client in accumulo to store geo-spatial information and
> > using geomesa for indexing on top of it. However i found that scanning
> *~1
> > million* records taking *2-3 sec*. I looked at indexes and query plan of
> > geomesa but not able to find cause of the problem. I am running accumulo
> as
> > single tablet-server(including master). I want to know -
> > what are the factors can affect accumulo scanning operation? how can I
> > optimise this time?
> >
> > Thank You
> > Suresh Prajapati
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message