accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sven Hodapp <>
Subject Accumulo Seek performance
Date Wed, 24 Aug 2016 13:22:19 GMT
Hi there,

currently we're experimenting with a two node Accumulo cluster (two tablet servers) setup
for document storage.
This documents are decomposed up to the sentence level.

Now I'm using a BatchScanner to assemble the full document like this:

    val bscan = instance.createBatchScanner(ARTIFACTS, auths, 10) // ARTIFACTS table currently
hosts ~30GB data, ~200M entries on ~45 tablets 
    bscan.setRanges(ranges)  // there are like 3000 Range.exact's in the ranges-list
      for (entry <- bscan.asScala) yield {
        val key = entry.getKey()
        val value = entry.getValue()
        // etc.

For larger full documents (e.g. 3000 exact ranges), this operation will take about 12 seconds.
But shorter documents are assembled blazing fast...

Is that to much for a BatchScanner / I'm misusing the BatchScaner?
Is that a normal time for such a (seek) operation?
Can I do something to get a better seek performance?

Note: I have already enabled bloom filtering on that table.

Thank you for any advice!


Sven Hodapp, M.Sc.,
Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
Department of Bioinformatics
Schloss Birlinghoven, 53754 Sankt Augustin, Germany

View raw message