accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher <ctubb...@apache.org>
Subject Re: Fetch Taking Longer Than Expected
Date Wed, 12 Aug 2015 21:08:14 GMT
The schema shown above doesn't quite look like it's well-suited for
locality groups, though. The CF field looks like it's a composition of
an attribute name and that attribute's value. To take advantage of
locality groups with that schema, you'd have to have a locality group
for every attribute name/value combination, which would probably not
work well.

If you want to take advantage of locality groups, you'll probably want
to make your CFs a small, discrete set (like just attribute names).
So, if you push the attribute value into the CQ, you could at the very
least limit your search to the locality containing the particular
attribute name you are searching for.

If you really want efficient searches based on attribute name/value
combinations, you're going to want to put this up the row (at the
beginning of your row), so your data is ordered (indexed) by that. You
could do this in a secondary index (which could be in a different
table, a different segment of this table, or in a separate locality
group in this table).

--
Christopher L Tubbs II
http://gravatar.com/ctubbsii


On Wed, Aug 12, 2015 at 11:20 AM, Josh Elser <josh.elser@gmail.com> wrote:
> Yup, that would be expected.
>
> Remember that doing `scan -c ...` is an unbounded search over your entire
> table. So, it takes approximately 3 minutes to read your GUIDIndexTable.
> Because you have a single locality group, all of the columns in your table
> are grouped together.
>
> One exercise that may be interesting for yourself is to create a locality
> group that has your specific column family in it, compact your
> GUIDIndexTable, and rerun your `scan -c` query. The speed should be similar
> to your exact scan. Removing the locality group and re-compacting the table
> should return the query time back to the slow 3 minutes.
>
> Does that make sense?
>
> Daniel Ruiz wrote:
>>
>> Hi All,
>>
>> I am having an issue where column fetches are taking over a minute on
>> 1.6.3. I don’t believe this should be case and my experience in the past
>> supports the idea that fetches should be very fast.
>>
>> For example we doing a scan on the table gives results instantly but
>> doing a scan -c vesselmmsitext=2706758566 takes 2 minutes and 44 seconds
>> (plus or minus 1 second).
>>
>> Figure 1.1. Generated Test Data on GUIDIndexTable
>>
>> Here is the table config
>>
>>
>> -----------+---------------------------------------------------------------+---------------------------------------------------------------------------------
>>
>> SCOPE | NAME | VALUE
>>
>>
>> -----------+---------------------------------------------------------------+---------------------------------------------------------------------------------
>>
>> default | table.balancer ..............................................
>> | org.apache.accumulo.server.master.balancer.DefaultLoadBalancer
>>
>> default | table.bloom.enabled .........................................
>> | false
>>
>> default | table.bloom.error.rate ......................................
>> | 0.5%
>>
>> default | table.bloom.hash.type .......................................
>> | murmur
>>
>> default | table.bloom.key.functor .....................................
>> | org.apache.accumulo.core.file.keyfunctor.RowFunctor
>>
>> default | table.bloom.load.threshold .................................. |
>> 1
>>
>> default | table.bloom.size ............................................
>> | 1048576
>>
>> default | table.cache.block.enable ....................................
>> | false
>>
>> default | table.cache.index.enable ....................................
>> | true
>>
>> default | table.classpath.context ..................................... |
>>
>> default | table.compaction.major.everything.idle ...................... |
>> 1h
>>
>> default | table.compaction.major.ratio ................................ |
>> 3
>>
>> default | table.compaction.minor.idle ................................. |
>> 5m
>>
>> default | table.compaction.minor.logs.threshold ....................... |
>> 3
>>
>> table | table.constraint.1 .......................................... |
>> org.apache.accumulo.core.constraints.DefaultKeySizeConstraint
>>
>> default | table.failures.ignore .......................................
>> | false
>>
>> default | table.file.blocksize ........................................ |
>> 0B
>>
>> default | table.file.compress.blocksize ...............................
>> | 100K
>>
>> default | table.file.compress.blocksize.index .........................
>> | 128K
>>
>> default | table.file.compress.type .................................... |
>> gz
>>
>> default | table.file.max .............................................. |
>> 15
>>
>> default | table.file.replication ...................................... |
>> 0
>>
>> default | table.file.type ............................................. |
>> rf
>>
>> default | table.formatter .............................................
>> | org.apache.accumulo.core.util.format.DefaultFormatter
>>
>> default | table.groups.enabled ........................................ |
>>
>> default | table.interepreter ..........................................
>> | org.apache.accumulo.core.util.interpret.DefaultScanInterpreter
>>
>> table | table.iterator.majc.AgeOffIterator##GUIDIndexTable .......... |
>> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter
>>
>> table | table.iterator.majc.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
>> 2592000000
>>
>> table | table.iterator.majc.vers .................................... |
>> 20,org.apache.accumulo.core.iterators.user.VersioningIterator
>>
>> table | table.iterator.majc.vers.opt.maxVersions .................... | 1
>>
>> table | table.iterator.minc.AgeOffIterator##GUIDIndexTable .......... |
>> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter
>>
>> table | table.iterator.minc.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
>> 2592000000
>>
>> table | table.iterator.minc.vers .................................... |
>> 20,org.apache.accumulo.core.iterators.user.VersioningIterator
>>
>> table | table.iterator.minc.vers.opt.maxVersions .................... | 1
>>
>> table | table.iterator.scan.AgeOffIterator##GUIDIndexTable .......... |
>> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter
>>
>> table | table.iterator.scan.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
>> 2592000000
>>
>> ---------------------------------------------------------- hit any key
>> to continue or 'q' to quit
>> ----------------------------------------------------------
>>
>> table | table.iterator.scan.vers .................................... |
>> 20,org.apache.accumulo.core.iterators.user.VersioningIterator
>>
>> table | table.iterator.scan.vers.opt.maxVersions .................... | 1
>>
>> default | table.majc.compaction.strategy ..............................
>> | org.apache.accumulo.tserver.compaction.DefaultCompactionStrategy
>>
>> default | table.scan.max.memory .......................................
>> | 512K
>>
>> table | @override ................................................ | 1M
>>
>> default | table.security.scan.visibility.default ...................... |
>>
>> default | table.split.threshold ....................................... |
>> 1G
>>
>> default | table.walog.enabled .........................................
>> | true
>>
>>
>> -----------+---------------------------------------------------------------+---------------------------------------------------------------------------------
>>
>> More Table Info:
>>
>> GUIDIndexTable <http://107.23.12.24:50095/tables?t=f>
>>
>>
>>
>> ONLINE
>>
>>
>>
>> 2
>>
>>
>>
>> 0
>>
>>
>>
>> 82.56M
>>
>>
>>
>> 810.00K
>>
>>
>>
>> 159
>>
>> Please let me know if I am doing something wrong to if there is more
>> information you need.
>>
>> V/r,
>>
>> -Daniel
>>
>

Mime
View raw message