accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lu.Qin"<luq.j...@gmail.com>
Subject Re: 回复:how can i optimize scan speed when use batch scan ?
Date Thu, 15 Jan 2015 02:42:23 GMT
Thanks for you help !


I compare the speed about exact and followingKey like this,is it right?
    Scanner scan = conn.createScanner("", new Authorizations());
    ListRange list = new ArrayListRange();
    for (Map.EntryKey, Value entry : scan) {
      if (list.size() == resultNum * threadNum) {
        break;
      }
      Key indexKey = entry.getKey();
      Key rowKey = new Key(indexKey.getColumnQualifier());
      Text followRow = rowKey.followingKey(PartialKey.ROW).getRow();
      list.add(new Range(rowKey.getRow(), followRow));
//      list.add(Range.exact(entry.getKey().getColumnQualifier()));
    }
    scan.close();


But i find that it not have big different,I make the list has 5000 range,and it cost about
13s when I use it by BatchScanner in two ways.


I change my config in accumulo-site.xml,and now the results=0 is not found.


This is my accumulo-site.xml:
property
  nametserver.cache.data.size/name
  value4G/value
 /property


 property
  nametserver.cache.index.size/name
  value16G/value
 /property


 property
  nametserver.memory.maps.native.enabled/name
  valuetrue/value
 /property


 property
  nametserver.metadata.readhead.concurrent.max/name
  value65536/value
 /property


 property
  nametserver.readhead.concurrent.max/name
  value65536/value
 /property


 property
  nametserver.scan.files.open.max/name
  value65536/value
 /property


 property
  nametable.cache.block.enable/name
  valuetrue/value
 /property


 property
  nametable.cache.index.enable/name
  valuetrue/value
 /property


Is it ok?


Thanks


原始邮件
发件人:Josh Elserjosh.elser@gmail.com
收件人:useruser@accumulo.apache.org
发送时间:2015年1月14日(周三) 11:13
主题:Re: 回复:how can i optimize scan speed when use batch scan ?


Thanks! That's very helpful. You probably meant to do the following: Key indexKey = entry.getKey();
Key rowKey = new Key(indexKey.getColumnQualifier()); Text followingRow = rowKey.followingKey(PartialKey.ROW).getRow();
list.add(new Range(k.getRow(), followingRow); Range.exact(row) will only match a Key which
has that exact row ID (empty column family and qualifier). The above will match all keys with
the provided row ID (all column families and qualifiers). Does that make sense (and hopefully
work)? 覃璐 wrote:   this is the code how I get the row ids which in ColumnQualify:    Scanner
scan = conn.createScanner(“t1", new Authorizations());   ListRange list = new ArrayListRange();
  for (Map.EntryKey, Value entry : scan) {   if (list.size() == resultNum * threadNum) { 
 break;   }   list.add(Range.exact(entry.getKey().getColumnQualifier()));   }   scan.close();
   and then I use the row ids to scan data.   BatchScanner bs = null;   try {   bs = conn.createBatchScanner("test.new_index",
new Authorizations(), 10);   } catch (TableNotFoundException e) {   e.printStackTrace(); 
 }   bs.setRanges(list);    原始邮件  *发件人:* Josh Elserjosh.elser@gmail.com  *收件人:*
useruser@accumulo.apache.org  *发送时间:* 2015年1月14日(周三) 10:32  *主题:*
Re: 回复:how can i optimize scan speed when use batch scan ?   You might need to set tserver.cache.data.size
to a larger value.  Depending on the amount of data, you might just churn through the cache
 without getting much benefit. I think you have to restart Accumulo after  changing this property.
  Can you show us the code you used to try to scan for a row ID and the  data in the table
you expected to be returned that wasn't?   覃璐 wrote:  Yes,I received all results what
I want when the program end.   But I do not know why the scan received 0 result when I ensure
a exists  row id?   I config the table.cache.block.enable=true,but I do not found distinct
 change.   Thanks    原始邮件  *发件人:* Eric Newtoneric.newton@gmail.com mailto:eric.newton@gmail.com
 *收件人:*user@accumulo.apache.org mailto:user@accumulo.apache.orguser@accumulo.apache.org
mailto:user@accumulo.apache.org  *发送时间:* 2015年1月14日(周三) 00:17  *主题:*
Re: 回复:how can i optimize scan speed when use batch scan ?   You should have received
at least 1390 Key/Value pairs (#results=1390).   If your application has many exact RowID
look-ups, you may want to  investigate Bloom filters.   Consider turning on data block caching
to reduce latency on future look-ups.   -Eric    On Mon, Jan 12, 2015 at 8:15 PM, 覃璐luq.java@gmail.com
mailto:luq.java@gmail.com  mailto:luq.java@gmail.com mailto:luq.java@gmail.com wrote:   i
am sorry i do not know about the image.   the log is this:    [17:50:38] TRACE  [org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator]
 [org.apache.accumulo.core.util.OpTimer.start(OpTimer.java:39)]  [21521] - tid=65 oid=675
Continuing multi scan,  scanid=-152589127623326551   [17:50:38] TRACE  [org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator]
 [org.apache.accumulo.core.util.OpTimer.stop(OpTimer.java:49)]  [21544] - tid=65 oid=675 Got
more multi scan results, #results=1390  scanID=-152589127623326551 in 0.023 secs   [17:50:38]
TRACE  [org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator]  [org.apache.accumulo.core.util.OpTimer.start(OpTimer.java:39)]
 [21546] - tid=65 oid=676 Continuing multi scan,  scanid=-152589127623326551   [17:50:38]
TRACE  [org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator]  [org.apache.accumulo.core.util.OpTimer.stop(OpTimer.java:49)]
 [21555] - tid=45 oid=644 Got more multi scan results, #results=0  scanID=-4477962012178388198
in 1.002 secs   [17:50:38] TRACE  [org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator]
 [org.apache.accumulo.core.util.OpTimer.start(OpTimer.java:39)]  [21555] - tid=45 oid=677
Continuing multi scan,  scanid=-4477962012178388198   [17:50:38] TRACE  [org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator]
 [org.apache.accumulo.core.util.OpTimer.stop(OpTimer.java:49)]  [21596] - tid=57 oid=645 Got
more multi scan results, #results=0  scanID=-8718025066902358141 in 1.003 secs   [17:50:38]
TRACE  [org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator]  [org.apache.accumulo.core.util.OpTimer.start(OpTimer.java:39)]
 [21596] - tid=57 oid=678 Continuing multi scan,  scanid=-8718025066902358141    the scan
spend long time but has no result.    i use 1.6.1,and the config output is this:    default
| table.balancer ............................ |  org.apache.accumulo.server.master.balancer.DefaultLoadBalancer
  default | table.bloom.enabled ....................... | false   default | table.bloom.error.rate
.................... | 0.5%   default | table.bloom.hash.type ..................... | murmur
  default | table.bloom.key.functor ................... |  org.apache.accumulo.core.file.keyfunctor.RowFunctor
  default | table.bloom.load.threshold ................ | 1   default | table.bloom.size ..........................
| 1048576   default | table.cache.block.enable .................. | false   default | table.cache.index.enable
.................. | true   default | table.classpath.context ................... |   default
| table.compaction.major.everything.idle .... | 1h   default | table.compaction.major.ratio
.............. | 3   default | table.compaction.minor.idle ............... | 5m   default
| table.compaction.minor.logs.threshold ..... | 3   table | table.constraint.1 ........................
|  org.apache.accumulo.core.constraints.DefaultKeySizeConstraint   default | table.failures.ignore
..................... | false   default | table.file.blocksize ...................... | 0B
  default | table.file.compress.blocksize ............. | 100K   default | table.file.compress.blocksize.index
....... | 128K   default | table.file.compress.type .................. | gz   default | table.file.max
............................ | 15   default | table.file.replication ....................
| 0   default | table.file.type ........................... | rf   default | table.formatter
........................... |  org.apache.accumulo.core.util.format.DefaultFormatter   default
| table.groups.enabled ...................... |   default | table.interepreter ........................
|  org.apache.accumulo.core.util.interpret.DefaultScanInterpreter   table | table.iterator.majc.vers
.................. |  20,org.apache.accumulo.core.iterators.user.VersioningIterator   table
| table.iterator.majc.vers.opt.maxVersions .. | 1   table | table.iterator.minc.vers ..................
|  20,org.apache.accumulo.core.iterators.user.VersioningIterator   table | table.iterator.minc.vers.opt.maxVersions
.. | 1   table | table.iterator.scan.vers .................. |  20,org.apache.accumulo.core.iterators.user.VersioningIterator
  table | table.iterator.scan.vers.opt.maxVersions .. | 1   default | table.majc.compaction.strategy
............ |  org.apache.accumulo.tserver.compaction.DefaultCompactionStrategy   default
| table.scan.max.memory ..................... | 512K   default | table.security.scan.visibility.default
.... |   default | table.split.threshold ..................... | 1G   default | table.walog.enabled
....................... | true    and my tablet server is 4 core,32G.    Thanks    原始邮件
 *发件人:* Josh Elserjosh.elser@gmail.com mailto:josh.elser@gmail.com mailto:josh.elser@gmail.com
mailto:josh.elser@gmail.com  *收件人:* useruser@accumulo.apache.org mailto:user@accumulo.apache.org
 mailto:user@accumulo.apache.org mailto:user@accumulo.apache.org  *发送时间:* 2015年1月12日(周一) 23:52
 *主题:* Re: 回复:how can i optimize scan speed when use batch scan ?   FYI, images
don't (typically) come across on the mailing list. Use some  external hosting and provide
the link if it's important, please.   How many tabletservers do you have? What version of
Accumulo are you  running? Can you share the output of `config -t your_table_name`?   Thanks.
  覃璐 wrote:   i look the trace log       why it receive 0 result and spend so long?  
    原始邮件   *发件人:* 覃璐luq.java@gmail.com mailto:luq.java@gmail.com mailto:luq.java@gmail.com
mailto:luq.java@gmail.com   *收件人:* useruser@accumulo.apache.org mailto:user@accumulo.apache.org
mailto:user@accumulo.apache.org mailto:user@accumulo.apache.org   *发送时间:* 2015年1月12日(周一) 17:05
  *主题:* how can i optimize scan speed when use batch scan ?     hi all.     now i have
code like this:     ListRange rangeList=…..;   BatchScanner bs=conn.createBatchScanner();
  bs.setRanges(rangeList);       the rangeList has many ranges about 1000,and every range
has a random   row id when i use Range.exact(new Text(…)),   but the speed is so slowly,it
maybe spend 2-3s,how can i optimize it ?     thanks
Mime
View raw message