accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher <ctubb...@apache.org>
Subject Re: Fetch Taking Longer Than Expected
Date Sat, 15 Aug 2015 01:30:29 GMT
Yes, Josh is right. Sorry if my wording led to any unnecessary confusion.

On Fri, Aug 14, 2015, 12:04 Josh Elser <josh.elser@gmail.com> wrote:

> "Small" might also be misleading. A locality group can have be a good
> way to separate a large collection of data from an actually small number
> of other records. Discrete yes, but the data itself does not need to be
> small to put it into a locality group.
>
> Christopher wrote:
> > I would be surprised if anybody has tested more than a dozen or two
> > locality groups or placed more than a dozen or two column families in
> > any one locality group.
> >
> >
> > On Fri, Aug 14, 2015, 01:28 Daniel Ruiz <daruiz.work@gmail.com
> > <mailto:daruiz.work@gmail.com>> wrote:
> >
> >     Thanks...We landed up doing just that.  Correct having a bunch of
> >     random data does not fit well with locality groups.  I did have
> >     another question though you mentioned  a "small discrete set".  What
> >     would you consider small?  Would you recommend for example against
> >     having several thousand locality groups in a table?
> >
> >     V/r,
> >     -Daniel
> >     -----Original Message-----
> >     From: Christopher [mailto:ctubbsii@apache.org
> >     <mailto:ctubbsii@apache.org>]
> >     Sent: Wednesday, August 12, 2015 3:08 PM
> >     To: Accumulo User List <user@accumulo.apache.org
> >     <mailto:user@accumulo.apache.org>>
> >     Subject: Re: Fetch Taking Longer Than Expected
> >
> >     The schema shown above doesn't quite look like it's well-suited for
> >     locality groups, though. The CF field looks like it's a composition
> of
> >     an attribute name and that attribute's value. To take advantage of
> >     locality groups with that schema, you'd have to have a locality group
> >     for every attribute name/value combination, which would probably not
> >     work well.
> >
> >     If you want to take advantage of locality groups, you'll probably
> want
> >     to make your CFs a small, discrete set (like just attribute names).
> >     So, if you push the attribute value into the CQ, you could at the
> very
> >     least limit your search to the locality containing the particular
> >     attribute name you are searching for.
> >
> >     If you really want efficient searches based on attribute name/value
> >     combinations, you're going to want to put this up the row (at the
> >     beginning of your row), so your data is ordered (indexed) by that.
> You
> >     could do this in a secondary index (which could be in a different
> >     table, a different segment of this table, or in a separate locality
> >     group in this table).
> >
> >     --
> >     Christopher L Tubbs II
> >     http://gravatar.com/ctubbsii
> >
> >
> >     On Wed, Aug 12, 2015 at 11:20 AM, Josh Elser <josh.elser@gmail.com
> >     <mailto:josh.elser@gmail.com>> wrote:
> >      > Yup, that would be expected.
> >      >
> >      > Remember that doing `scan -c ...` is an unbounded search over
> >     your entire
> >      > table. So, it takes approximately 3 minutes to read your
> >     GUIDIndexTable.
> >      > Because you have a single locality group, all of the columns in
> >     your table
> >      > are grouped together.
> >      >
> >      > One exercise that may be interesting for yourself is to create a
> >     locality
> >      > group that has your specific column family in it, compact your
> >      > GUIDIndexTable, and rerun your `scan -c` query. The speed should
> >     be similar
> >      > to your exact scan. Removing the locality group and re-compacting
> >     the table
> >      > should return the query time back to the slow 3 minutes.
> >      >
> >      > Does that make sense?
> >      >
> >      > Daniel Ruiz wrote:
> >      >>
> >      >> Hi All,
> >      >>
> >      >> I am having an issue where column fetches are taking over a
> >     minute on
> >      >> 1.6.3. I don’t believe this should be case and my experience in
> >     the past
> >      >> supports the idea that fetches should be very fast.
> >      >>
> >      >> For example we doing a scan on the table gives results instantly
> but
> >      >> doing a scan -c vesselmmsitext=2706758566 takes 2 minutes and 44
> >     seconds
> >      >> (plus or minus 1 second).
> >      >>
> >      >> Figure 1.1. Generated Test Data on GUIDIndexTable
> >      >>
> >      >> Here is the table config
> >      >>
> >      >>
> >      >>
> >
>  -----------+---------------------------------------------------------------+---------------------------------------------------------------------------------
> >      >>
> >      >> SCOPE | NAME | VALUE
> >      >>
> >      >>
> >      >>
> >
>  -----------+---------------------------------------------------------------+---------------------------------------------------------------------------------
> >      >>
> >      >> default | table.balancer
> >     ..............................................
> >      >> | org.apache.accumulo.server.master.balancer.DefaultLoadBalancer
> >      >>
> >      >> default | table.bloom.enabled
> >     .........................................
> >      >> | false
> >      >>
> >      >> default | table.bloom.error.rate
> >     ......................................
> >      >> | 0.5%
> >      >>
> >      >> default | table.bloom.hash.type
> >     .......................................
> >      >> | murmur
> >      >>
> >      >> default | table.bloom.key.functor
> >     .....................................
> >      >> | org.apache.accumulo.core.file.keyfunctor.RowFunctor
> >      >>
> >      >> default | table.bloom.load.threshold
> >     .................................. |
> >      >> 1
> >      >>
> >      >> default | table.bloom.size
> >     ............................................
> >      >> | 1048576
> >      >>
> >      >> default | table.cache.block.enable
> >     ....................................
> >      >> | false
> >      >>
> >      >> default | table.cache.index.enable
> >     ....................................
> >      >> | true
> >      >>
> >      >> default | table.classpath.context
> >     ..................................... |
> >      >>
> >      >> default | table.compaction.major.everything.idle
> >     ...................... |
> >      >> 1h
> >      >>
> >      >> default | table.compaction.major.ratio
> >     ................................ |
> >      >> 3
> >      >>
> >      >> default | table.compaction.minor.idle
> >     ................................. |
> >      >> 5m
> >      >>
> >      >> default | table.compaction.minor.logs.threshold
> >     ....................... |
> >      >> 3
> >      >>
> >      >> table | table.constraint.1
> >     .......................................... |
> >      >> org.apache.accumulo.core.constraints.DefaultKeySizeConstraint
> >      >>
> >      >> default | table.failures.ignore
> >     .......................................
> >      >> | false
> >      >>
> >      >> default | table.file.blocksize
> >     ........................................ |
> >      >> 0B
> >      >>
> >      >> default | table.file.compress.blocksize
> >     ...............................
> >      >> | 100K
> >      >>
> >      >> default | table.file.compress.blocksize.index
> >     .........................
> >      >> | 128K
> >      >>
> >      >> default | table.file.compress.type
> >     .................................... |
> >      >> gz
> >      >>
> >      >> default | table.file.max
> >     .............................................. |
> >      >> 15
> >      >>
> >      >> default | table.file.replication
> >     ...................................... |
> >      >> 0
> >      >>
> >      >> default | table.file.type
> >     ............................................. |
> >      >> rf
> >      >>
> >      >> default | table.formatter
> >     .............................................
> >      >> | org.apache.accumulo.core.util.format.DefaultFormatter
> >      >>
> >      >> default | table.groups.enabled
> >     ........................................ |
> >      >>
> >      >> default | table.interepreter
> >     ..........................................
> >      >> | org.apache.accumulo.core.util.interpret.DefaultScanInterpreter
> >      >>
> >      >> table | table.iterator.majc.AgeOffIterator##GUIDIndexTable
> >     .......... |
> >      >> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter
> >      >>
> >      >> table |
> >     table.iterator.majc.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
> >      >> 2592000000
> >      >>
> >      >> table | table.iterator.majc.vers
> >     .................................... |
> >      >> 20,org.apache.accumulo.core.iterators.user.VersioningIterator
> >      >>
> >      >> table | table.iterator.majc.vers.opt.maxVersions
> >     .................... | 1
> >      >>
> >      >> table | table.iterator.minc.AgeOffIterator##GUIDIndexTable
> >     .......... |
> >      >> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter
> >      >>
> >      >> table |
> >     table.iterator.minc.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
> >      >> 2592000000
> >      >>
> >      >> table | table.iterator.minc.vers
> >     .................................... |
> >      >> 20,org.apache.accumulo.core.iterators.user.VersioningIterator
> >      >>
> >      >> table | table.iterator.minc.vers.opt.maxVersions
> >     .................... | 1
> >      >>
> >      >> table | table.iterator.scan.AgeOffIterator##GUIDIndexTable
> >     .......... |
> >      >> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter
> >      >>
> >      >> table |
> >     table.iterator.scan.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
> >      >> 2592000000
> >      >>
> >      >> ---------------------------------------------------------- hit
> >     any key
> >      >> to continue or 'q' to quit
> >      >> ----------------------------------------------------------
> >      >>
> >      >> table | table.iterator.scan.vers
> >     .................................... |
> >      >> 20,org.apache.accumulo.core.iterators.user.VersioningIterator
> >      >>
> >      >> table | table.iterator.scan.vers.opt.maxVersions
> >     .................... | 1
> >      >>
> >      >> default | table.majc.compaction.strategy
> >     ..............................
> >      >> |
> org.apache.accumulo.tserver.compaction.DefaultCompactionStrategy
> >      >>
> >      >> default | table.scan.max.memory
> >     .......................................
> >      >> | 512K
> >      >>
> >      >> table | @override
> >     ................................................ | 1M
> >      >>
> >      >> default | table.security.scan.visibility.default
> >     ...................... |
> >      >>
> >      >> default | table.split.threshold
> >     ....................................... |
> >      >> 1G
> >      >>
> >      >> default | table.walog.enabled
> >     .........................................
> >      >> | true
> >      >>
> >      >>
> >      >>
> >
>  -----------+---------------------------------------------------------------+---------------------------------------------------------------------------------
> >      >>
> >      >> More Table Info:
> >      >>
> >      >> GUIDIndexTable <http://107.23.12.24:50095/tables?t=f>
> >      >>
> >      >>
> >      >>
> >      >> ONLINE
> >      >>
> >      >>
> >      >>
> >      >> 2
> >      >>
> >      >>
> >      >>
> >      >> 0
> >      >>
> >      >>
> >      >>
> >      >> 82.56M
> >      >>
> >      >>
> >      >>
> >      >> 810.00K
> >      >>
> >      >>
> >      >>
> >      >> 159
> >      >>
> >      >> Please let me know if I am doing something wrong to if there is
> more
> >      >> information you need.
> >      >>
> >      >> V/r,
> >      >>
> >      >> -Daniel
> >      >>
> >      >
> >
>

Mime
View raw message