Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B43A818CB0 for ; Wed, 12 Aug 2015 21:08:15 +0000 (UTC) Received: (qmail 48658 invoked by uid 500); 12 Aug 2015 21:08:15 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 48607 invoked by uid 500); 12 Aug 2015 21:08:15 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 48591 invoked by uid 99); 12 Aug 2015 21:08:15 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Aug 2015 21:08:15 +0000 Received: from mail-vk0-f50.google.com (mail-vk0-f50.google.com [209.85.213.50]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 2C81A1A003F for ; Wed, 12 Aug 2015 21:08:15 +0000 (UTC) Received: by vkhl6 with SMTP id l6so10722323vkh.1 for ; Wed, 12 Aug 2015 14:08:14 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.52.163.133 with SMTP id yi5mr42169762vdb.26.1439413694242; Wed, 12 Aug 2015 14:08:14 -0700 (PDT) Received: by 10.31.13.66 with HTTP; Wed, 12 Aug 2015 14:08:14 -0700 (PDT) In-Reply-To: <55CB6433.7080006@gmail.com> References: <002a01d0d4c2$f2a52ec0$d7ef8c40$@gmail.com> <55CB6433.7080006@gmail.com> Date: Wed, 12 Aug 2015 17:08:14 -0400 Message-ID: Subject: Re: Fetch Taking Longer Than Expected From: Christopher To: Accumulo User List Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable The schema shown above doesn't quite look like it's well-suited for locality groups, though. The CF field looks like it's a composition of an attribute name and that attribute's value. To take advantage of locality groups with that schema, you'd have to have a locality group for every attribute name/value combination, which would probably not work well. If you want to take advantage of locality groups, you'll probably want to make your CFs a small, discrete set (like just attribute names). So, if you push the attribute value into the CQ, you could at the very least limit your search to the locality containing the particular attribute name you are searching for. If you really want efficient searches based on attribute name/value combinations, you're going to want to put this up the row (at the beginning of your row), so your data is ordered (indexed) by that. You could do this in a secondary index (which could be in a different table, a different segment of this table, or in a separate locality group in this table). -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Wed, Aug 12, 2015 at 11:20 AM, Josh Elser wrote: > Yup, that would be expected. > > Remember that doing `scan -c ...` is an unbounded search over your entire > table. So, it takes approximately 3 minutes to read your GUIDIndexTable. > Because you have a single locality group, all of the columns in your tabl= e > are grouped together. > > One exercise that may be interesting for yourself is to create a locality > group that has your specific column family in it, compact your > GUIDIndexTable, and rerun your `scan -c` query. The speed should be simil= ar > to your exact scan. Removing the locality group and re-compacting the tab= le > should return the query time back to the slow 3 minutes. > > Does that make sense? > > Daniel Ruiz wrote: >> >> Hi All, >> >> I am having an issue where column fetches are taking over a minute on >> 1.6.3. I don=E2=80=99t believe this should be case and my experience in = the past >> supports the idea that fetches should be very fast. >> >> For example we doing a scan on the table gives results instantly but >> doing a scan -c vesselmmsitext=3D2706758566 takes 2 minutes and 44 secon= ds >> (plus or minus 1 second). >> >> Figure 1.1. Generated Test Data on GUIDIndexTable >> >> Here is the table config >> >> >> -----------+------------------------------------------------------------= ---+-----------------------------------------------------------------------= ---------- >> >> SCOPE | NAME | VALUE >> >> >> -----------+------------------------------------------------------------= ---+-----------------------------------------------------------------------= ---------- >> >> default | table.balancer .............................................. >> | org.apache.accumulo.server.master.balancer.DefaultLoadBalancer >> >> default | table.bloom.enabled ......................................... >> | false >> >> default | table.bloom.error.rate ...................................... >> | 0.5% >> >> default | table.bloom.hash.type ....................................... >> | murmur >> >> default | table.bloom.key.functor ..................................... >> | org.apache.accumulo.core.file.keyfunctor.RowFunctor >> >> default | table.bloom.load.threshold .................................. = | >> 1 >> >> default | table.bloom.size ............................................ >> | 1048576 >> >> default | table.cache.block.enable .................................... >> | false >> >> default | table.cache.index.enable .................................... >> | true >> >> default | table.classpath.context ..................................... = | >> >> default | table.compaction.major.everything.idle ...................... = | >> 1h >> >> default | table.compaction.major.ratio ................................ = | >> 3 >> >> default | table.compaction.minor.idle ................................. = | >> 5m >> >> default | table.compaction.minor.logs.threshold ....................... = | >> 3 >> >> table | table.constraint.1 .......................................... | >> org.apache.accumulo.core.constraints.DefaultKeySizeConstraint >> >> default | table.failures.ignore ....................................... >> | false >> >> default | table.file.blocksize ........................................ = | >> 0B >> >> default | table.file.compress.blocksize ............................... >> | 100K >> >> default | table.file.compress.blocksize.index ......................... >> | 128K >> >> default | table.file.compress.type .................................... = | >> gz >> >> default | table.file.max .............................................. = | >> 15 >> >> default | table.file.replication ...................................... = | >> 0 >> >> default | table.file.type ............................................. = | >> rf >> >> default | table.formatter ............................................. >> | org.apache.accumulo.core.util.format.DefaultFormatter >> >> default | table.groups.enabled ........................................ = | >> >> default | table.interepreter .......................................... >> | org.apache.accumulo.core.util.interpret.DefaultScanInterpreter >> >> table | table.iterator.majc.AgeOffIterator##GUIDIndexTable .......... | >> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter >> >> table | table.iterator.majc.AgeOffIterator##GUIDIndexTable.opt.ttl .. | >> 2592000000 >> >> table | table.iterator.majc.vers .................................... | >> 20,org.apache.accumulo.core.iterators.user.VersioningIterator >> >> table | table.iterator.majc.vers.opt.maxVersions .................... | = 1 >> >> table | table.iterator.minc.AgeOffIterator##GUIDIndexTable .......... | >> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter >> >> table | table.iterator.minc.AgeOffIterator##GUIDIndexTable.opt.ttl .. | >> 2592000000 >> >> table | table.iterator.minc.vers .................................... | >> 20,org.apache.accumulo.core.iterators.user.VersioningIterator >> >> table | table.iterator.minc.vers.opt.maxVersions .................... | = 1 >> >> table | table.iterator.scan.AgeOffIterator##GUIDIndexTable .......... | >> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter >> >> table | table.iterator.scan.AgeOffIterator##GUIDIndexTable.opt.ttl .. | >> 2592000000 >> >> ---------------------------------------------------------- hit any key >> to continue or 'q' to quit >> ---------------------------------------------------------- >> >> table | table.iterator.scan.vers .................................... | >> 20,org.apache.accumulo.core.iterators.user.VersioningIterator >> >> table | table.iterator.scan.vers.opt.maxVersions .................... | = 1 >> >> default | table.majc.compaction.strategy .............................. >> | org.apache.accumulo.tserver.compaction.DefaultCompactionStrategy >> >> default | table.scan.max.memory ....................................... >> | 512K >> >> table | @override ................................................ | 1M >> >> default | table.security.scan.visibility.default ...................... = | >> >> default | table.split.threshold ....................................... = | >> 1G >> >> default | table.walog.enabled ......................................... >> | true >> >> >> -----------+------------------------------------------------------------= ---+-----------------------------------------------------------------------= ---------- >> >> More Table Info: >> >> GUIDIndexTable >> >> >> >> ONLINE >> >> >> >> 2 >> >> >> >> 0 >> >> >> >> 82.56M >> >> >> >> 810.00K >> >> >> >> 159 >> >> Please let me know if I am doing something wrong to if there is more >> information you need. >> >> V/r, >> >> -Daniel >> >