accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Ruiz" <daruiz.w...@gmail.com>
Subject RE: Fetch Taking Longer Than Expected
Date Fri, 14 Aug 2015 21:34:06 GMT
Okay, thanks for the information and your time it has been very helpful.

V/r,
-Daniel

-----Original Message-----
From: Josh Elser [mailto:josh.elser@gmail.com] 
Sent: Friday, August 14, 2015 10:04 AM
To: user@accumulo.apache.org
Subject: Re: Fetch Taking Longer Than Expected

"Small" might also be misleading. A locality group can have be a good 
way to separate a large collection of data from an actually small number 
of other records. Discrete yes, but the data itself does not need to be 
small to put it into a locality group.

Christopher wrote:
> I would be surprised if anybody has tested more than a dozen or two
> locality groups or placed more than a dozen or two column families in
> any one locality group.
>
>
> On Fri, Aug 14, 2015, 01:28 Daniel Ruiz <daruiz.work@gmail.com
> <mailto:daruiz.work@gmail.com>> wrote:
>
>     Thanks...We landed up doing just that.  Correct having a bunch of
>     random data does not fit well with locality groups.  I did have
>     another question though you mentioned  a "small discrete set".  What
>     would you consider small?  Would you recommend for example against
>     having several thousand locality groups in a table?
>
>     V/r,
>     -Daniel
>     -----Original Message-----
>     From: Christopher [mailto:ctubbsii@apache.org
>     <mailto:ctubbsii@apache.org>]
>     Sent: Wednesday, August 12, 2015 3:08 PM
>     To: Accumulo User List <user@accumulo.apache.org
>     <mailto:user@accumulo.apache.org>>
>     Subject: Re: Fetch Taking Longer Than Expected
>
>     The schema shown above doesn't quite look like it's well-suited for
>     locality groups, though. The CF field looks like it's a composition of
>     an attribute name and that attribute's value. To take advantage of
>     locality groups with that schema, you'd have to have a locality group
>     for every attribute name/value combination, which would probably not
>     work well.
>
>     If you want to take advantage of locality groups, you'll probably want
>     to make your CFs a small, discrete set (like just attribute names).
>     So, if you push the attribute value into the CQ, you could at the very
>     least limit your search to the locality containing the particular
>     attribute name you are searching for.
>
>     If you really want efficient searches based on attribute name/value
>     combinations, you're going to want to put this up the row (at the
>     beginning of your row), so your data is ordered (indexed) by that. You
>     could do this in a secondary index (which could be in a different
>     table, a different segment of this table, or in a separate locality
>     group in this table).
>
>     --
>     Christopher L Tubbs II
>     http://gravatar.com/ctubbsii
>
>
>     On Wed, Aug 12, 2015 at 11:20 AM, Josh Elser <josh.elser@gmail.com
>     <mailto:josh.elser@gmail.com>> wrote:
>      > Yup, that would be expected.
>      >
>      > Remember that doing `scan -c ...` is an unbounded search over
>     your entire
>      > table. So, it takes approximately 3 minutes to read your
>     GUIDIndexTable.
>      > Because you have a single locality group, all of the columns in
>     your table
>      > are grouped together.
>      >
>      > One exercise that may be interesting for yourself is to create a
>     locality
>      > group that has your specific column family in it, compact your
>      > GUIDIndexTable, and rerun your `scan -c` query. The speed should
>     be similar
>      > to your exact scan. Removing the locality group and re-compacting
>     the table
>      > should return the query time back to the slow 3 minutes.
>      >
>      > Does that make sense?
>      >
>      > Daniel Ruiz wrote:
>      >>
>      >> Hi All,
>      >>
>      >> I am having an issue where column fetches are taking over a
>     minute on
>      >> 1.6.3. I don’t believe this should be case and my experience in
>     the past
>      >> supports the idea that fetches should be very fast.
>      >>
>      >> For example we doing a scan on the table gives results instantly but
>      >> doing a scan -c vesselmmsitext=2706758566 takes 2 minutes and 44
>     seconds
>      >> (plus or minus 1 second).
>      >>
>      >> Figure 1.1. Generated Test Data on GUIDIndexTable
>      >>
>      >> Here is the table config
>      >>
>      >>
>      >>
>     -----------+---------------------------------------------------------------+---------------------------------------------------------------------------------
>      >>
>      >> SCOPE | NAME | VALUE
>      >>
>      >>
>      >>
>     -----------+---------------------------------------------------------------+---------------------------------------------------------------------------------
>      >>
>      >> default | table.balancer
>     ..............................................
>      >> | org.apache.accumulo.server.master.balancer.DefaultLoadBalancer
>      >>
>      >> default | table.bloom.enabled
>     .........................................
>      >> | false
>      >>
>      >> default | table.bloom.error.rate
>     ......................................
>      >> | 0.5%
>      >>
>      >> default | table.bloom.hash.type
>     .......................................
>      >> | murmur
>      >>
>      >> default | table.bloom.key.functor
>     .....................................
>      >> | org.apache.accumulo.core.file.keyfunctor.RowFunctor
>      >>
>      >> default | table.bloom.load.threshold
>     .................................. |
>      >> 1
>      >>
>      >> default | table.bloom.size
>     ............................................
>      >> | 1048576
>      >>
>      >> default | table.cache.block.enable
>     ....................................
>      >> | false
>      >>
>      >> default | table.cache.index.enable
>     ....................................
>      >> | true
>      >>
>      >> default | table.classpath.context
>     ..................................... |
>      >>
>      >> default | table.compaction.major.everything.idle
>     ...................... |
>      >> 1h
>      >>
>      >> default | table.compaction.major.ratio
>     ................................ |
>      >> 3
>      >>
>      >> default | table.compaction.minor.idle
>     ................................. |
>      >> 5m
>      >>
>      >> default | table.compaction.minor.logs.threshold
>     ....................... |
>      >> 3
>      >>
>      >> table | table.constraint.1
>     .......................................... |
>      >> org.apache.accumulo.core.constraints.DefaultKeySizeConstraint
>      >>
>      >> default | table.failures.ignore
>     .......................................
>      >> | false
>      >>
>      >> default | table.file.blocksize
>     ........................................ |
>      >> 0B
>      >>
>      >> default | table.file.compress.blocksize
>     ...............................
>      >> | 100K
>      >>
>      >> default | table.file.compress.blocksize.index
>     .........................
>      >> | 128K
>      >>
>      >> default | table.file.compress.type
>     .................................... |
>      >> gz
>      >>
>      >> default | table.file.max
>     .............................................. |
>      >> 15
>      >>
>      >> default | table.file.replication
>     ...................................... |
>      >> 0
>      >>
>      >> default | table.file.type
>     ............................................. |
>      >> rf
>      >>
>      >> default | table.formatter
>     .............................................
>      >> | org.apache.accumulo.core.util.format.DefaultFormatter
>      >>
>      >> default | table.groups.enabled
>     ........................................ |
>      >>
>      >> default | table.interepreter
>     ..........................................
>      >> | org.apache.accumulo.core.util.interpret.DefaultScanInterpreter
>      >>
>      >> table | table.iterator.majc.AgeOffIterator##GUIDIndexTable
>     .......... |
>      >> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter
>      >>
>      >> table |
>     table.iterator.majc.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
>      >> 2592000000
>      >>
>      >> table | table.iterator.majc.vers
>     .................................... |
>      >> 20,org.apache.accumulo.core.iterators.user.VersioningIterator
>      >>
>      >> table | table.iterator.majc.vers.opt.maxVersions
>     .................... | 1
>      >>
>      >> table | table.iterator.minc.AgeOffIterator##GUIDIndexTable
>     .......... |
>      >> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter
>      >>
>      >> table |
>     table.iterator.minc.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
>      >> 2592000000
>      >>
>      >> table | table.iterator.minc.vers
>     .................................... |
>      >> 20,org.apache.accumulo.core.iterators.user.VersioningIterator
>      >>
>      >> table | table.iterator.minc.vers.opt.maxVersions
>     .................... | 1
>      >>
>      >> table | table.iterator.scan.AgeOffIterator##GUIDIndexTable
>     .......... |
>      >> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter
>      >>
>      >> table |
>     table.iterator.scan.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
>      >> 2592000000
>      >>
>      >> ---------------------------------------------------------- hit
>     any key
>      >> to continue or 'q' to quit
>      >> ----------------------------------------------------------
>      >>
>      >> table | table.iterator.scan.vers
>     .................................... |
>      >> 20,org.apache.accumulo.core.iterators.user.VersioningIterator
>      >>
>      >> table | table.iterator.scan.vers.opt.maxVersions
>     .................... | 1
>      >>
>      >> default | table.majc.compaction.strategy
>     ..............................
>      >> | org.apache.accumulo.tserver.compaction.DefaultCompactionStrategy
>      >>
>      >> default | table.scan.max.memory
>     .......................................
>      >> | 512K
>      >>
>      >> table | @override
>     ................................................ | 1M
>      >>
>      >> default | table.security.scan.visibility.default
>     ...................... |
>      >>
>      >> default | table.split.threshold
>     ....................................... |
>      >> 1G
>      >>
>      >> default | table.walog.enabled
>     .........................................
>      >> | true
>      >>
>      >>
>      >>
>     -----------+---------------------------------------------------------------+---------------------------------------------------------------------------------
>      >>
>      >> More Table Info:
>      >>
>      >> GUIDIndexTable <http://107.23.12.24:50095/tables?t=f>
>      >>
>      >>
>      >>
>      >> ONLINE
>      >>
>      >>
>      >>
>      >> 2
>      >>
>      >>
>      >>
>      >> 0
>      >>
>      >>
>      >>
>      >> 82.56M
>      >>
>      >>
>      >>
>      >> 810.00K
>      >>
>      >>
>      >>
>      >> 159
>      >>
>      >> Please let me know if I am doing something wrong to if there is more
>      >> information you need.
>      >>
>      >> V/r,
>      >>
>      >> -Daniel
>      >>
>      >
>


Mime
View raw message