accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Fetch Taking Longer Than Expected
Date Wed, 12 Aug 2015 21:14:16 GMT
No, I was not recommending locality groups as a solution to the problem, 
but using them to illustrate why the query was taking a long time.

do() and observe slow
change config
do() and observe fast

I was not completely clear that I was not recommending use of locality 
groups as a solution to slow scans. The solution is to not do an 
unbounded `scan -c` and expect it to be fast.

Christopher wrote:
> The schema shown above doesn't quite look like it's well-suited for
> locality groups, though. The CF field looks like it's a composition of
> an attribute name and that attribute's value. To take advantage of
> locality groups with that schema, you'd have to have a locality group
> for every attribute name/value combination, which would probably not
> work well.
>
> If you want to take advantage of locality groups, you'll probably want
> to make your CFs a small, discrete set (like just attribute names).
> So, if you push the attribute value into the CQ, you could at the very
> least limit your search to the locality containing the particular
> attribute name you are searching for.
>
> If you really want efficient searches based on attribute name/value
> combinations, you're going to want to put this up the row (at the
> beginning of your row), so your data is ordered (indexed) by that. You
> could do this in a secondary index (which could be in a different
> table, a different segment of this table, or in a separate locality
> group in this table).
>
> --
> Christopher L Tubbs II
> http://gravatar.com/ctubbsii
>
>
> On Wed, Aug 12, 2015 at 11:20 AM, Josh Elser<josh.elser@gmail.com>  wrote:
>> Yup, that would be expected.
>>
>> Remember that doing `scan -c ...` is an unbounded search over your entire
>> table. So, it takes approximately 3 minutes to read your GUIDIndexTable.
>> Because you have a single locality group, all of the columns in your table
>> are grouped together.
>>
>> One exercise that may be interesting for yourself is to create a locality
>> group that has your specific column family in it, compact your
>> GUIDIndexTable, and rerun your `scan -c` query. The speed should be similar
>> to your exact scan. Removing the locality group and re-compacting the table
>> should return the query time back to the slow 3 minutes.
>>
>> Does that make sense?
>>
>> Daniel Ruiz wrote:
>>> Hi All,
>>>
>>> I am having an issue where column fetches are taking over a minute on
>>> 1.6.3. I don’t believe this should be case and my experience in the past
>>> supports the idea that fetches should be very fast.
>>>
>>> For example we doing a scan on the table gives results instantly but
>>> doing a scan -c vesselmmsitext=2706758566 takes 2 minutes and 44 seconds
>>> (plus or minus 1 second).
>>>
>>> Figure 1.1. Generated Test Data on GUIDIndexTable
>>>
>>> Here is the table config
>>>
>>>
>>> -----------+---------------------------------------------------------------+---------------------------------------------------------------------------------
>>>
>>> SCOPE | NAME | VALUE
>>>
>>>
>>> -----------+---------------------------------------------------------------+---------------------------------------------------------------------------------
>>>
>>> default | table.balancer ..............................................
>>> | org.apache.accumulo.server.master.balancer.DefaultLoadBalancer
>>>
>>> default | table.bloom.enabled .........................................
>>> | false
>>>
>>> default | table.bloom.error.rate ......................................
>>> | 0.5%
>>>
>>> default | table.bloom.hash.type .......................................
>>> | murmur
>>>
>>> default | table.bloom.key.functor .....................................
>>> | org.apache.accumulo.core.file.keyfunctor.RowFunctor
>>>
>>> default | table.bloom.load.threshold .................................. |
>>> 1
>>>
>>> default | table.bloom.size ............................................
>>> | 1048576
>>>
>>> default | table.cache.block.enable ....................................
>>> | false
>>>
>>> default | table.cache.index.enable ....................................
>>> | true
>>>
>>> default | table.classpath.context ..................................... |
>>>
>>> default | table.compaction.major.everything.idle ...................... |
>>> 1h
>>>
>>> default | table.compaction.major.ratio ................................ |
>>> 3
>>>
>>> default | table.compaction.minor.idle ................................. |
>>> 5m
>>>
>>> default | table.compaction.minor.logs.threshold ....................... |
>>> 3
>>>
>>> table | table.constraint.1 .......................................... |
>>> org.apache.accumulo.core.constraints.DefaultKeySizeConstraint
>>>
>>> default | table.failures.ignore .......................................
>>> | false
>>>
>>> default | table.file.blocksize ........................................ |
>>> 0B
>>>
>>> default | table.file.compress.blocksize ...............................
>>> | 100K
>>>
>>> default | table.file.compress.blocksize.index .........................
>>> | 128K
>>>
>>> default | table.file.compress.type .................................... |
>>> gz
>>>
>>> default | table.file.max .............................................. |
>>> 15
>>>
>>> default | table.file.replication ...................................... |
>>> 0
>>>
>>> default | table.file.type ............................................. |
>>> rf
>>>
>>> default | table.formatter .............................................
>>> | org.apache.accumulo.core.util.format.DefaultFormatter
>>>
>>> default | table.groups.enabled ........................................ |
>>>
>>> default | table.interepreter ..........................................
>>> | org.apache.accumulo.core.util.interpret.DefaultScanInterpreter
>>>
>>> table | table.iterator.majc.AgeOffIterator##GUIDIndexTable .......... |
>>> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter
>>>
>>> table | table.iterator.majc.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
>>> 2592000000
>>>
>>> table | table.iterator.majc.vers .................................... |
>>> 20,org.apache.accumulo.core.iterators.user.VersioningIterator
>>>
>>> table | table.iterator.majc.vers.opt.maxVersions .................... | 1
>>>
>>> table | table.iterator.minc.AgeOffIterator##GUIDIndexTable .......... |
>>> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter
>>>
>>> table | table.iterator.minc.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
>>> 2592000000
>>>
>>> table | table.iterator.minc.vers .................................... |
>>> 20,org.apache.accumulo.core.iterators.user.VersioningIterator
>>>
>>> table | table.iterator.minc.vers.opt.maxVersions .................... | 1
>>>
>>> table | table.iterator.scan.AgeOffIterator##GUIDIndexTable .......... |
>>> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter
>>>
>>> table | table.iterator.scan.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
>>> 2592000000
>>>
>>> ---------------------------------------------------------- hit any key
>>> to continue or 'q' to quit
>>> ----------------------------------------------------------
>>>
>>> table | table.iterator.scan.vers .................................... |
>>> 20,org.apache.accumulo.core.iterators.user.VersioningIterator
>>>
>>> table | table.iterator.scan.vers.opt.maxVersions .................... | 1
>>>
>>> default | table.majc.compaction.strategy ..............................
>>> | org.apache.accumulo.tserver.compaction.DefaultCompactionStrategy
>>>
>>> default | table.scan.max.memory .......................................
>>> | 512K
>>>
>>> table | @override ................................................ | 1M
>>>
>>> default | table.security.scan.visibility.default ...................... |
>>>
>>> default | table.split.threshold ....................................... |
>>> 1G
>>>
>>> default | table.walog.enabled .........................................
>>> | true
>>>
>>>
>>> -----------+---------------------------------------------------------------+---------------------------------------------------------------------------------
>>>
>>> More Table Info:
>>>
>>> GUIDIndexTable<http://107.23.12.24:50095/tables?t=f>
>>>
>>>
>>>
>>> ONLINE
>>>
>>>
>>>
>>> 2
>>>
>>>
>>>
>>> 0
>>>
>>>
>>>
>>> 82.56M
>>>
>>>
>>>
>>> 810.00K
>>>
>>>
>>>
>>> 159
>>>
>>> Please let me know if I am doing something wrong to if there is more
>>> information you need.
>>>
>>> V/r,
>>>
>>> -Daniel
>>>

Mime
View raw message