Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9E23610631 for ; Fri, 1 Nov 2013 20:18:41 +0000 (UTC) Received: (qmail 12159 invoked by uid 500); 1 Nov 2013 20:18:41 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 12125 invoked by uid 500); 1 Nov 2013 20:18:41 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 12117 invoked by uid 99); 1 Nov 2013 20:18:41 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Nov 2013 20:18:41 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of padraigdidanan@gmail.com designates 209.85.219.54 as permitted sender) Received: from [209.85.219.54] (HELO mail-oa0-f54.google.com) (209.85.219.54) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Nov 2013 20:18:34 +0000 Received: by mail-oa0-f54.google.com with SMTP id o20so5055998oag.41 for ; Fri, 01 Nov 2013 13:18:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to:content-type; bh=mDhtirCxCwvlc3P8HUlkdibKbP2lXRMUAPIosZ49BqY=; b=fVyW+/YHlS4Aq+o/8PKL7czFw3GO5o/yoFGjdsipgWkyjEPalx134nWvpeyxrADtU7 4DJa1/QR1/zaBympQvFeRb6uOhQXGzF4Xu0w4C3yNmekjJdq64ghkg+A+xh4puxL/C3S 0hyQXFxJD/CKsxcqvyPdCbPf0M14Dp5IN/RS6d1/xEhVmW52oANLAT7r1GxHoF53e3Zr PIiAg98ecm+MpVqiszlOvWDJodq0rI0XjjsdZXz8y0f5OEKvElOxi+XCv0tzM0K3VqkQ pOGwAstTIj7d+3MWw7x9yV2DH/BewL8yAptvfojJlh1CDfvd18is+IqaHtYztAhNdpMU W6Jg== X-Received: by 10.182.107.164 with SMTP id hd4mr3197296obb.58.1383337093168; Fri, 01 Nov 2013 13:18:13 -0700 (PDT) MIME-Version: 1.0 Received: by 10.60.32.76 with HTTP; Fri, 1 Nov 2013 13:17:53 -0700 (PDT) From: Dave Mullins Date: Fri, 1 Nov 2013 16:17:53 -0400 Message-ID: Subject: Accumulo Upgrade from 1.4.2 to 1.5.0 Issues To: user@accumulo.apache.org Content-Type: multipart/alternative; boundary=089e01175e4d0b054604ea234477 X-Virus-Checked: Checked by ClamAV on apache.org --089e01175e4d0b054604ea234477 Content-Type: text/plain; charset=ISO-8859-1 Hadoop version 0.20.2-cdh3u5 This was installed from the cdh rpms but is not controlled by a cloudera manager. I read what documentation I could find on the upgrade. I installed from the tarball version of 1.5.0. I made sure to include the commons collection in the accumulo library path. I made sure to add the dfs.support.append true to the hdfs-site files. I did a complete restart ( to include a reboot) of the system. All of the tablet servers come online all the master's services come online and seem to be working. (The monitor does show the correct number of tablets, tablet servers, and so forth.) I am able to use some of the features of the accumulo shell I can display the contents of a table. I can't create or delete a table without getting the following error: [impl.ThriftTransportPool] WARN: Thread "shell" stuck on io to x.x.x.x:9999:9999 (0) for at least 120040 ms When I go digging in the logs I find very few errors. (These systems are not on a net I can cut and paste to here so I am trying to represent the issue as best I can.) There are 4 errors that the Repo runner [0-3] threads died Another error that springs up occasionally is : WARN: Thread "GC" stuck on io to x.x.x.x:9999:9999 (0) for at least 120040 ms A netstat run before I start the master up shows nothing running on port 9999 nor any connections to that port. A netstat after about the accumulo start shows about 16 connections in a TIME_WAIT state in the 35k-36k port range from the master. It also show an established state for 1 both both direction (36783) and inbound from port 9999 to port 47636 also from the master. It seems after this point anything that tries to connect to port 9999 goes into a TIME_WAIT and never does anything. I have checked all the permissions I can think of and everything seems to be correct. HDFS is running correctly and jobs not associated with accumulo all see to be working. --089e01175e4d0b054604ea234477 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hadoop version 0.20.2-cdh3u5
This was insta= lled from the cdh rpms but is not controlled by a cloudera manager.
=A0
I read what documentation I could find on the upgrade.
I installed from the tarball version of 1.5.0.
I made sure to= include the commons collection in the accumulo library path.
I made sure to add the dfs.support.append true to the hdfs-site files.
I did a complete restart ( to include a reboot) of the system.
All of the tablet servers come online
all the master's services come online and seem to be working. (The monitor= =20 does show the correct number of tablets, tablet servers, and so forth.)

I am able to use some of the features of the accumulo shell<= br>
I can display the contents of a table.
I can't create = or delete a table without getting the following error:
[impl.= ThriftTransportPool] WARN: Thread "shell" stuck on io to x.x.x.x:= 9999:9999 (0) for at least 120040 ms

When I go digging in the logs I find very few errors.=20 (These systems are not on a net I can cut and paste to here so I am=20 trying to represent the issue as best I can.)

There are 4= errors that the Repo runner [0-3] threads died

Another error that springs up occasionally is : WARN: Thread "GC&q= uot; stuck on io to x.x.x.x:9999:9999 (0) for at least 120040 ms

A netstat run before I start the master up shows nothing running on= port 9999 nor any connections to that port.
A netstat after about the accumulo start shows about 16=20 connections in a TIME_WAIT state in the 35k-36k port range from the=20 master. It also show an established state for 1 both both direction=20 (36783) and inbound from port 9999 to port 47636 also from the master.

It seems after this point anything that tries to connect to = port 9999 goes into a TIME_WAIT and never does anything.

= I have checked all the permissions I can think of and everything seems to b= e correct.
HDFS is running correctly and jobs not associated with accumulo all s= ee to be working.
--089e01175e4d0b054604ea234477--