accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fikri Akbar <fikri.ak...@ci-mediatrac.com>
Subject Fwd: Data authorization/visibility limit in Accumulo
Date Fri, 08 Apr 2016 10:30:23 GMT
Hi Guys,

We're a group of accumulo enthusiasts from Indonesia. We've been trying to
implement accumulo for several different type of data processing purposes.
We've got several questions regarding Accumulo, which you might help us
with. We encounter these issues when we're trying to process heavy amount
of data, our questions are as follows:

1. Let's say that I have a file in HDFS that's about 300 GB with a total
1.6 Billion rows, and each line are separated by "^". The question is, what
is the most effective way to move the data to Accumulo (with assumption
that the structure of each cell is [rowkey cf:cq vis value] => [lineNumber
raw:columnName fileName columnValue])?

2. What is the most effective way to ingest data, if we're receiving data
with the size of >1 TB on a daily basis?

3. We're currently testing the ability of Accumulo for its data-level
access control, however the issue regarding the limit of dataset
authorization occurred when the datasets reached >20,000.

For example, lets say user X has a data called one.txt. This will make user
X has authorization to one.txt (let's call it X.one.txt). Now, what if X
has more than that (one.txt, two.xt, three.txt...n.txt), this will result
in user X having multiple authorization (as much as the data or n
authorization) and apparently when we tried it for datasets >20,000 (which
user will have >20,000 authorization), we're not able to execute "get
auth". We find that this is a very crucial issue, especially if (in one
case) there's >20,000 datasets that is being granted authorization at once.

The following are error logs from our system.

*Error log in shell:*

org.apache.accumulo.core.client.AccumuloException:
org.apache.thrift.TApplicationException: Internal error processing
getUserAuthorizations
        at
org.apache.accumulo.core.client.impl.SecurityOperationsImpl.execute(SecurityOperationsImpl.java:83)
        at
org.apache.accumulo.core.client.impl.SecurityOperationsImpl.getUserAuthorizations(SecurityOperationsImpl.java:182)
        at com.msk.auxilium.table.AuxUser.setUserAuth(AuxUser.java:310)
        at
com.msk.auxilium.commons.UserSystem.getAuxUser(UserSystem.java:24)
        at com.msk.auxilium.tester.HDFSTest.main(HDFSTest.java:57)
Caused by: org.apache.thrift.TApplicationException: Internal error
processing getUserAuthorizations
        at
org.apache.thrift.TApplicationException.read(TApplicationException.java:108)
        at
org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71)
        at
org.apache.accumulo.core.client.impl.thrift.ClientService$Client.recv_getUserAuthorizations(ClientService.java:580)
        at
org.apache.accumulo.core.client.impl.thrift.ClientService$Client.getUserAuthorizations(ClientService.java:565)
        at
org.apache.accumulo.core.client.impl.SecurityOperationsImpl$6.execute(SecurityOperationsImpl.java:185)
        at
org.apache.accumulo.core.client.impl.SecurityOperationsImpl$6.execute(SecurityOperationsImpl.java:182)
        at
org.apache.accumulo.core.client.impl.ServerClient.executeRaw(ServerClient.java:90)
        at
org.apache.accumulo.core.client.impl.SecurityOperationsImpl.execute(SecurityOperationsImpl.java:69)
        ... 4 more

*Error log in accumulo master (web)*
tserver:

Zookeeper error, will retry
	org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for
/accumulo/281c3ac0-74eb-4135-bc63-3158eabe2c47/tables/1a/conf/table.split.threshold
		at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
		at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
		at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
		at org.apache.accumulo.fate.zookeeper.ZooCache$2.run(ZooCache.java:210)
		at org.apache.accumulo.fate.zookeeper.ZooCache.retry(ZooCache.java:132)
		at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:235)
		at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:190)
		at org.apache.accumulo.server.conf.TableConfiguration.get(TableConfiguration.java:130)
		at org.apache.accumulo.server.conf.TableConfiguration.get(TableConfiguration.java:118)
		at org.apache.accumulo.core.conf.AccumuloConfiguration.getMemoryInBytes(AccumuloConfiguration.java:100)
		at org.apache.accumulo.tserver.Tablet.findSplitRow(Tablet.java:2892)
		at org.apache.accumulo.tserver.Tablet.needsSplit(Tablet.java:3032)
		at org.apache.accumulo.tserver.TabletServer$MajorCompactor.run(TabletServer.java:2603)
		at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
		at java.lang.Thread.run(Thread.java:745)

*garbage collector:*

Zookeeper error, will retry
	org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for
/accumulo/281c3ac0-74eb-4135-bc63-3158eabe2c47/tables
		at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
		at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
		at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1468)
		at org.apache.accumulo.fate.zookeeper.ZooCache$1.run(ZooCache.java:169)
		at org.apache.accumulo.fate.zookeeper.ZooCache.retry(ZooCache.java:132)
		at org.apache.accumulo.fate.zookeeper.ZooCache.getChildren(ZooCache.java:180)
		at org.apache.accumulo.core.client.impl.Tables.getMap(Tables.java:126)
		at org.apache.accumulo.core.client.impl.Tables.getNameToIdMap(Tables.java:197)
		at org.apache.accumulo.core.client.impl.Tables._getTableId(Tables.java:173)
		at org.apache.accumulo.core.client.impl.Tables.getTableId(Tables.java:166)
		at org.apache.accumulo.core.client.impl.ConnectorImpl.getTableId(ConnectorImpl.java:84)
		at org.apache.accumulo.core.client.impl.ConnectorImpl.createScanner(ConnectorImpl.java:151)
		at org.apache.accumulo.gc.SimpleGarbageCollector$GCEnv.getCandidates(SimpleGarbageCollector.java:278)
		at org.apache.accumulo.gc.GarbageCollectionAlgorithm.getCandidates(GarbageCollectionAlgorithm.java:238)
		at org.apache.accumulo.gc.GarbageCollectionAlgorithm.collect(GarbageCollectionAlgorithm.java:272)
		at org.apache.accumulo.gc.SimpleGarbageCollector.run(SimpleGarbageCollector.java:544)
		at org.apache.accumulo.gc.SimpleGarbageCollector.main(SimpleGarbageCollector.java:154)
		at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
		at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
		at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
		at java.lang.reflect.Method.invoke(Method.java:606)
		at org.apache.accumulo.start.Main$1.run(Main.java:141)
		at java.lang.Thread.run(Thread.java:745)


we tried finding some resources regarding this issue, but couldn't find any
that mention the limit of authorizations per user and FYI we're using
accumulo version 1.6.

Sorry for the long email :) and have a great day.

Regards,

*Fikri Akbar*
Technology


*PT Mediatrac Sistem Komunikasi*
Grha Tirtadi 2nd Floor   |   Jl. Senopati 71-73   |   Jakarta 12110   |
Indonesia   |   *M**ap* 6°13'57.37"S 106°48'42.29"E
*P* +62 21 520 2568   |   *F* +62 21 520 4180   |   *M*  +62 812 1243 4786
   |   *www.mediatrac.co.id <http://www.mediatrac.co.id>*

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message