accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sven Hodapp <sven.hod...@scai.fraunhofer.de>
Subject Accumulo Seek performance
Date Wed, 24 Aug 2016 13:22:19 GMT
Hi there,

currently we're experimenting with a two node Accumulo cluster (two tablet servers) setup
for document storage.
This documents are decomposed up to the sentence level.

Now I'm using a BatchScanner to assemble the full document like this:

    val bscan = instance.createBatchScanner(ARTIFACTS, auths, 10) // ARTIFACTS table currently
hosts ~30GB data, ~200M entries on ~45 tablets 
    bscan.setRanges(ranges)  // there are like 3000 Range.exact's in the ranges-list
      for (entry <- bscan.asScala) yield {
        val key = entry.getKey()
        val value = entry.getValue()
        // etc.
      }

For larger full documents (e.g. 3000 exact ranges), this operation will take about 12 seconds.
But shorter documents are assembled blazing fast...

Is that to much for a BatchScanner / I'm misusing the BatchScaner?
Is that a normal time for such a (seek) operation?
Can I do something to get a better seek performance?

Note: I have already enabled bloom filtering on that table.

Thank you for any advice!

Regards,
Sven

-- 
Sven Hodapp, M.Sc.,
Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
Department of Bioinformatics
Schloss Birlinghoven, 53754 Sankt Augustin, Germany
sven.hodapp@scai.fraunhofer.de
www.scai.fraunhofer.de

Mime
View raw message