accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher <ctubb...@apache.org>
Subject Re: Fetch Taking Longer Than Expected
Date Fri, 14 Aug 2015 06:05:06 GMT
I would be surprised if anybody has tested more than a dozen or two
locality groups or placed more than a dozen or two column families in any
one locality group.

On Fri, Aug 14, 2015, 01:28 Daniel Ruiz <daruiz.work@gmail.com> wrote:

> Thanks...We landed up doing just that.  Correct having a bunch of random
> data does not fit well with locality groups.  I did have another question
> though you mentioned  a "small discrete set".  What would you consider
> small?  Would you recommend for example against having several thousand
> locality groups in a table?
>
> V/r,
> -Daniel
> -----Original Message-----
> From: Christopher [mailto:ctubbsii@apache.org]
> Sent: Wednesday, August 12, 2015 3:08 PM
> To: Accumulo User List <user@accumulo.apache.org>
> Subject: Re: Fetch Taking Longer Than Expected
>
> The schema shown above doesn't quite look like it's well-suited for
> locality groups, though. The CF field looks like it's a composition of
> an attribute name and that attribute's value. To take advantage of
> locality groups with that schema, you'd have to have a locality group
> for every attribute name/value combination, which would probably not
> work well.
>
> If you want to take advantage of locality groups, you'll probably want
> to make your CFs a small, discrete set (like just attribute names).
> So, if you push the attribute value into the CQ, you could at the very
> least limit your search to the locality containing the particular
> attribute name you are searching for.
>
> If you really want efficient searches based on attribute name/value
> combinations, you're going to want to put this up the row (at the
> beginning of your row), so your data is ordered (indexed) by that. You
> could do this in a secondary index (which could be in a different
> table, a different segment of this table, or in a separate locality
> group in this table).
>
> --
> Christopher L Tubbs II
> http://gravatar.com/ctubbsii
>
>
> On Wed, Aug 12, 2015 at 11:20 AM, Josh Elser <josh.elser@gmail.com> wrote:
> > Yup, that would be expected.
> >
> > Remember that doing `scan -c ...` is an unbounded search over your entire
> > table. So, it takes approximately 3 minutes to read your GUIDIndexTable.
> > Because you have a single locality group, all of the columns in your
> table
> > are grouped together.
> >
> > One exercise that may be interesting for yourself is to create a locality
> > group that has your specific column family in it, compact your
> > GUIDIndexTable, and rerun your `scan -c` query. The speed should be
> similar
> > to your exact scan. Removing the locality group and re-compacting the
> table
> > should return the query time back to the slow 3 minutes.
> >
> > Does that make sense?
> >
> > Daniel Ruiz wrote:
> >>
> >> Hi All,
> >>
> >> I am having an issue where column fetches are taking over a minute on
> >> 1.6.3. I don’t believe this should be case and my experience in the past
> >> supports the idea that fetches should be very fast.
> >>
> >> For example we doing a scan on the table gives results instantly but
> >> doing a scan -c vesselmmsitext=2706758566 takes 2 minutes and 44 seconds
> >> (plus or minus 1 second).
> >>
> >> Figure 1.1. Generated Test Data on GUIDIndexTable
> >>
> >> Here is the table config
> >>
> >>
> >>
> -----------+---------------------------------------------------------------+---------------------------------------------------------------------------------
> >>
> >> SCOPE | NAME | VALUE
> >>
> >>
> >>
> -----------+---------------------------------------------------------------+---------------------------------------------------------------------------------
> >>
> >> default | table.balancer ..............................................
> >> | org.apache.accumulo.server.master.balancer.DefaultLoadBalancer
> >>
> >> default | table.bloom.enabled .........................................
> >> | false
> >>
> >> default | table.bloom.error.rate ......................................
> >> | 0.5%
> >>
> >> default | table.bloom.hash.type .......................................
> >> | murmur
> >>
> >> default | table.bloom.key.functor .....................................
> >> | org.apache.accumulo.core.file.keyfunctor.RowFunctor
> >>
> >> default | table.bloom.load.threshold ..................................
> |
> >> 1
> >>
> >> default | table.bloom.size ............................................
> >> | 1048576
> >>
> >> default | table.cache.block.enable ....................................
> >> | false
> >>
> >> default | table.cache.index.enable ....................................
> >> | true
> >>
> >> default | table.classpath.context .....................................
> |
> >>
> >> default | table.compaction.major.everything.idle ......................
> |
> >> 1h
> >>
> >> default | table.compaction.major.ratio ................................
> |
> >> 3
> >>
> >> default | table.compaction.minor.idle .................................
> |
> >> 5m
> >>
> >> default | table.compaction.minor.logs.threshold .......................
> |
> >> 3
> >>
> >> table | table.constraint.1 .......................................... |
> >> org.apache.accumulo.core.constraints.DefaultKeySizeConstraint
> >>
> >> default | table.failures.ignore .......................................
> >> | false
> >>
> >> default | table.file.blocksize ........................................
> |
> >> 0B
> >>
> >> default | table.file.compress.blocksize ...............................
> >> | 100K
> >>
> >> default | table.file.compress.blocksize.index .........................
> >> | 128K
> >>
> >> default | table.file.compress.type ....................................
> |
> >> gz
> >>
> >> default | table.file.max ..............................................
> |
> >> 15
> >>
> >> default | table.file.replication ......................................
> |
> >> 0
> >>
> >> default | table.file.type .............................................
> |
> >> rf
> >>
> >> default | table.formatter .............................................
> >> | org.apache.accumulo.core.util.format.DefaultFormatter
> >>
> >> default | table.groups.enabled ........................................
> |
> >>
> >> default | table.interepreter ..........................................
> >> | org.apache.accumulo.core.util.interpret.DefaultScanInterpreter
> >>
> >> table | table.iterator.majc.AgeOffIterator##GUIDIndexTable .......... |
> >> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter
> >>
> >> table | table.iterator.majc.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
> >> 2592000000
> >>
> >> table | table.iterator.majc.vers .................................... |
> >> 20,org.apache.accumulo.core.iterators.user.VersioningIterator
> >>
> >> table | table.iterator.majc.vers.opt.maxVersions .................... |
> 1
> >>
> >> table | table.iterator.minc.AgeOffIterator##GUIDIndexTable .......... |
> >> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter
> >>
> >> table | table.iterator.minc.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
> >> 2592000000
> >>
> >> table | table.iterator.minc.vers .................................... |
> >> 20,org.apache.accumulo.core.iterators.user.VersioningIterator
> >>
> >> table | table.iterator.minc.vers.opt.maxVersions .................... |
> 1
> >>
> >> table | table.iterator.scan.AgeOffIterator##GUIDIndexTable .......... |
> >> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter
> >>
> >> table | table.iterator.scan.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
> >> 2592000000
> >>
> >> ---------------------------------------------------------- hit any key
> >> to continue or 'q' to quit
> >> ----------------------------------------------------------
> >>
> >> table | table.iterator.scan.vers .................................... |
> >> 20,org.apache.accumulo.core.iterators.user.VersioningIterator
> >>
> >> table | table.iterator.scan.vers.opt.maxVersions .................... |
> 1
> >>
> >> default | table.majc.compaction.strategy ..............................
> >> | org.apache.accumulo.tserver.compaction.DefaultCompactionStrategy
> >>
> >> default | table.scan.max.memory .......................................
> >> | 512K
> >>
> >> table | @override ................................................ | 1M
> >>
> >> default | table.security.scan.visibility.default ......................
> |
> >>
> >> default | table.split.threshold .......................................
> |
> >> 1G
> >>
> >> default | table.walog.enabled .........................................
> >> | true
> >>
> >>
> >>
> -----------+---------------------------------------------------------------+---------------------------------------------------------------------------------
> >>
> >> More Table Info:
> >>
> >> GUIDIndexTable <http://107.23.12.24:50095/tables?t=f>
> >>
> >>
> >>
> >> ONLINE
> >>
> >>
> >>
> >> 2
> >>
> >>
> >>
> >> 0
> >>
> >>
> >>
> >> 82.56M
> >>
> >>
> >>
> >> 810.00K
> >>
> >>
> >>
> >> 159
> >>
> >> Please let me know if I am doing something wrong to if there is more
> >> information you need.
> >>
> >> V/r,
> >>
> >> -Daniel
> >>
> >
>
>

Mime
View raw message