accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marc P." <marc.par...@gmail.com>
Subject Re: Accumulo Table Sacanning Taking Time!!!
Date Thu, 27 Apr 2017 13:35:45 GMT
Suresh,
   There are a lot of configuration points that can have an impact. For
example, there is a configuration option that dictates how much data is
returned each "iteration," called table.scan.max.memory [0]. Increasing
this will cause more work to be done in each RPC call to get data. Lowering
this can have the illusion of improved response time since you get data
faster. Playing with this might impact your use case. If your keys/values
are large you might attempt to increase this configuration number.

Further, scanning can be impacted by the size of the data and the way it is
stored. Table block caching might have an improvement [1], but I'm curious
about how the data is stored. Do you have example keys. Are you returning
all 1 million records from Accumulo through the scanner to perform some
logic client side or is the logic server side in an iterator? Could you do
more work in an iterator? Iterating over 1 M keys likely won't take 2-3
seconds when executed at the tablet server, depending on the size of the
key. Providing some insight into what the key structure is might give us
more insight into how to better configure your tablet server properties.

   Finally, is the 2-3 seconds just the time to get the data or does that
include time to inspect keys?

[0]
http://accumulo.apache.org/1.6/accumulo_user_manual#_table_scan_max_memory
[1] http://accumulo.apache.org/1.6/accumulo_user_manual#_block_cache

On Thu, Apr 27, 2017 at 7:09 AM, Suresh Prajapati <sureshpraja1234@gmail.com
> wrote:

> Hello Team
>
> I am developing a client in accumulo to store geo-spatial information and
> using geomesa for indexing on top of it. However i found that scanning *~1
> million* records taking *2-3 sec*. I looked at indexes and query plan of
> geomesa but not able to find cause of the problem. I am running accumulo as
> single tablet-server(including master). I want to know -
> what are the factors can affect accumulo scanning operation? how can I
> optimise this time?
>
> Thank You
> Suresh Prajapati
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message