accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <ke...@deenlo.com>
Subject Re: Accumulo Seek performance
Date Mon, 29 Aug 2016 20:37:32 GMT
On Wed, Aug 24, 2016 at 9:22 AM, Sven Hodapp
<sven.hodapp@scai.fraunhofer.de> wrote:
> Hi there,
>
> currently we're experimenting with a two node Accumulo cluster (two tablet servers) setup
for document storage.
> This documents are decomposed up to the sentence level.
>
> Now I'm using a BatchScanner to assemble the full document like this:
>
>     val bscan = instance.createBatchScanner(ARTIFACTS, auths, 10) // ARTIFACTS table
currently hosts ~30GB data, ~200M entries on ~45 tablets
>     bscan.setRanges(ranges)  // there are like 3000 Range.exact's in the ranges-list
>       for (entry <- bscan.asScala) yield {
>         val key = entry.getKey()
>         val value = entry.getValue()
>         // etc.
>       }
>
> For larger full documents (e.g. 3000 exact ranges), this operation will take about 12
seconds.
> But shorter documents are assembled blazing fast...
>
> Is that to much for a BatchScanner / I'm misusing the BatchScaner?
> Is that a normal time for such a (seek) operation?
> Can I do something to get a better seek performance?

How many threads did you configure the batch scanner with and did you
try varying this?

>
> Note: I have already enabled bloom filtering on that table.
>
> Thank you for any advice!
>
> Regards,
> Sven
>
> --
> Sven Hodapp, M.Sc.,
> Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
> Department of Bioinformatics
> Schloss Birlinghoven, 53754 Sankt Augustin, Germany
> sven.hodapp@scai.fraunhofer.de
> www.scai.fraunhofer.de

Mime
View raw message