accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kesten Broughton <kbrough...@21ct.com>
Subject Re: ingest problems
Date Tue, 11 Feb 2014 19:13:40 GMT
Hi Sean,
Thanks for your detailed questions.  We will add them to the automated log gathering bundle
we have.
Responses inline.  It’s not a ton to go on, but now we will be ready to capture every relevant
detail for our next ingest tests.


Hi Kesten!

Could you tell us:

1) Accumulo version

accumulo-native-1.5.0  Running on CentOS 6.5, using the RPM distribution package from Apache,
on jdk 1.7.0_u25

[user@node-hdfs02:accumulo-1.5.0]$ rpm -q accumulo

accumulo-1.5.0-1.noarch

2) HDFS + ZooKeeper versions

Hadoop 1.2.0.1.3.2.0-111 and Zookeeper version: 3.4.5-111--1, built on 08/20/2013 01:42 GMT

[user@node-hdfs02:accumulo-1.5.0]$ echo status | nc 10.x.y.67 2181

Zookeeper version: 3.4.5-111--1, built on 08/20/2013 01:42 GMT

Clients:

 /10.x.y.67:48232[1](queued=0,recved=73830,sent=73830)

 /10.x.y.67:57837[0](queued=0,recved=1,sent=0)

 /10.x.y.67:49991[1](queued=0,recved=41486,sent=41492)

 /10.x.y.66:41154[1](queued=0,recved=245163,sent=245163)


Latency min/avg/max: 0/0/244

Received: 1943601

Sent: 1943610

Connections: 4

Outstanding: 0

Zxid: 0x1500088030

Mode: leader

Node count: 503

[user@node-hdfs02:accumulo-1.5.0]$ hadoop version

Hadoop 1.2.0.1.3.2.0-111

Subversion git://c64-s8/ on branch comanche-branch-1 -r 3e43bec958e627d53f02d2842f6fac24a93110a9

Compiled by jenkins on Mon Aug 19 18:34:32 PDT 2013

>From source with checksum cf234891d3fd875413caf539bc5aa5ce

This command was run using /usr/lib/hadoop/hadoop-core-1.2.0.1.3.2.0-111.jar


3) are you using the BatchWriter API, or bulk ingest?

Using the BatchWriter API

4) what does your table design look like?

Loading a directed graph model presentation, with attributed nodes and unattributed edges.
Vertex identifier are used for Row ID values, family to distinguish between attribute/relation
types.

Key     Value
Row ID  Column  Timestamp
Family  Qualifier       Visibility
000c35b2-ee6c-339e-9e6a-65a9bccbfa2c    A       attribute1       <unset>        <default>
      value1
000c35b2-ee6c-339e-9e6a-65a9bccbfa2c    A       attribute2      <unset>          <default>
     value2
000c35b2-ee6c-339e-9e6a-65a9bccbfa2c    R       relation_name_outgoing/87fd1ea5-0769-3086-b328-c5272ac7d65c
    <unset>  <default>      1
000c35b2-ee6c-339e-9e6a-65a9bccbfa2c    RI      relation_name_incoming/4231ea5-c527-3086-d65c-0a6c2ac7b328
     <unset> <default>



5) what does your source data look like?

The dataset is 65Gb and we have about 200 million documents.  The source data is coming from
files of JSON documents, with each document fully defining a vertex or edge.

The vertex files are gzipped with a line for each document, with each line containing a UUID
and JSON document pair separated by a space.

The edge files are gzipped with a line for each document, with each line containing the source
UUID, destination UUID, and label each separated by a space.


6) what kind of hardware is on these 3 nodes? Memory, disks, CPU cores

For our datacenter baremetal cluster:

Dell 720s, 256 Gb ram, 7 Tb as 24 x 300Gb disks, 24 hyperthreaded cores.  We have 2 disks
in raid 10 for the root partition, /dev/sdw mounted at /var and 8 disks mounted as /data/1/dfs/dn
to /data/8

They are connected by 10 Gb ethernet.

For our virtual clusters:

Each node is a running as  VMWare deployed VM with the same configuration:

 *
    *   4 CPU Cores
    *   16 GB memory
    *   100 GB (ext4 fs, no lvm)
    *   CentOS 6.5

7) could you post your config files (minus any passwords, usernames, machine names, or instance
secrets) in a gist or pastebin so that I can see them?

Summary:  We use ambari to deploy the hdfs cluster and usually the 2Gb example accumulo-env.sh.
 On one occasion we used 3Gb example and increased memory map size and related settings.

I have attached the ambari snapshots of the modified hdfs java heap size settings.  In this
instance, we cranked them way up, to 4Gb, but we have returned to 2Gb as there is indication
that for < 2,000 blocks there is no need for such a large heap and it could be detrimental.
 The second attachment shows the extra hdfs-site.xml settings being used such as durable.sync

Redacted config files gist:

https://gist.github.com/darKoram/8dcc63e212d052c70e29

8) could you describe what the failure mode looks like a bit? Does the monitor come up? Does
a table remain offline or with unrecovered tablets?

Typically, the monitor will be up, as long as the accumulo master process didn't die.  The
dashboard log will show errors.  We have seen lots of Zookeeper Connection errors, IOException,
ThriftTransport, !0/ table and other errors plus Warnings for gc Collection and others.  See
attachments for more.

If the cluster fails on ingest, when we bring it back up we may see the names of the tables,
but dashes for all other entries in the table.

Morgan

In the initial failure, one of the tablet server processes will dissappear from one of the
cluster nodes, and the client will fail shortly after with org.apache.accumulo.core.client.TimedOutException
talking to the node that failed. It is possible to use the accumulo shell to cluster; however
any access to the target table will cause the shell to hang. Likewise the monitor webpage
hangs when attempting to load at this point.

Attempting to restart the failed node individually usually doesn't work at this point, as
the tablet server exits when started (start-here.sh). Stopping and restarting the cluster
of accumulo processes (stop-all.sh, start-all.sh) will allow us to get the accumulo processes
running again on all nodes. However, at this point any access to the table that was being
loaded will hang (scan, delete, load). From the monitor there aren't any errors shown, and
indicates the expected number of tablets are running.


From: Sean Busbey <busbey@clouderagovt.com<mailto:busbey@clouderagovt.com>>
Reply-To: "user@accumulo.apache.org<mailto:user@accumulo.apache.org>" <user@accumulo.apache.org<mailto:user@accumulo.apache.org>>
Date: Tuesday, February 11, 2014 at 10:24 AM
To: Accumulo User List <user@accumulo.apache.org<mailto:user@accumulo.apache.org>>
Subject: Re: ingest problems


Hi Kesten!

Could you tell us:

1) Accumulo version

2) HDFS + ZooKeeper versions

3) are you using the BatchWriter API, or bulk ingest?

4) what does your table design look like?

5) what does your source data look like?

6) what kind of hardware is on these 3 nodes? Memory, disks, CPU cores.

7) could you post your config files (minus any passwords, usernames, machine names, or instance
secrets) in a gist or pastebin so that I can see them?

8) could you describe what the failure mode looks like a bit? Does the monitor come up? Does
a table remain offline or with unrecovered tablets?

On Feb 11, 2014 10:11 AM, "Kesten Broughton" <kbroughton@21ct.com<mailto:kbroughton@21ct.com>>
wrote:
Hi there,

We have been experimenting with accumulo for about two months now.  Our biggest painpoint
has been on ingest.
Often we will have ingest process fail 2 or 3 times 3/4 of the way through an ingest and then
on a final try it works, without any changes.

Once the ingest works, the cluster is usually stable for querying for weeks or months only
requiring the occasional start-all.sh if there is a problem.

Sometimes our ingest can be 24 hours long, and we need a stronger ingest story to be able
to commit to accumulo.
Our cluster architecture has been:
3 hdfs datanodes overlaid with name node, secondary nn and accumulo master each collocated
with a datanode, and a zookeeper server on each.
We realize this is not optimal and are transitioning to separate hardware for zookeepers and
name/secondary/accumulomaster nodes.
However, the big concern is that sometimes a failed ingest will bork the whole cluster and
we have to re-init accumulo with an accumulo init destroying all our data.
We have experienced this on at least three different clusters of this description.

The most recent attempt was on a 65GB dataset.   The cluster had been up for over 24 hours.
 The ingest test takes 40 mins and about 5 mins in, one of the datanodes failed.
There were no error logs on the failed node, and the two other nodes had logs filled with
zookeeper connection errors.  We were unable to recover the cluster and had to re-init.

I know a vague description of problems is difficult to respond to, and the next time we have
an ingest failure, i will bring specifics forward.  But I’m writing to know if
1.  Ingest failures are a known fail point for accumulo, or if we are perhaps unlucky/mis-configured.
2.  Are there any guidelines for capturing ingest failures / determining root causes when
errors don’t show up in the logs
3.  Are there any means of checkpointing a data ingest, so that if a failure were to occur
at hour 23.5 we could roll back to hour 23 and continue.  Client code could checkpoint and
restart at the last one, but if the underlying accumulo cluster can’t be recovered, that’s
of no use.

thanks,

kesten

Mime
View raw message