Return-Path: X-Original-To: apmail-accumulo-dev-archive@www.apache.org Delivered-To: apmail-accumulo-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3460E1186D for ; Thu, 19 Jun 2014 14:53:17 +0000 (UTC) Received: (qmail 48499 invoked by uid 500); 19 Jun 2014 14:53:17 -0000 Delivered-To: apmail-accumulo-dev-archive@accumulo.apache.org Received: (qmail 48456 invoked by uid 500); 19 Jun 2014 14:53:17 -0000 Mailing-List: contact dev-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@accumulo.apache.org Delivered-To: mailing list dev@accumulo.apache.org Received: (qmail 48443 invoked by uid 99); 19 Jun 2014 14:53:16 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 19 Jun 2014 14:53:16 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of bimargulies@gmail.com designates 209.85.219.50 as permitted sender) Received: from [209.85.219.50] (HELO mail-oa0-f50.google.com) (209.85.219.50) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 19 Jun 2014 14:53:11 +0000 Received: by mail-oa0-f50.google.com with SMTP id n16so5460192oag.9 for ; Thu, 19 Jun 2014 07:52:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=temHtYsbjVM1IXL1ifrUVYtG+jIVXlS2xTQmkJMPQJI=; b=LFC74P+uFdWC9t2vyfXvEEnzILAqX7/a80R87adDLdzRKn1JYGnhGA4wNCTP5NalIR NGA4E0+7UTQwB6F+uldIq/auHmtqnmFlL9ZnAHFqarzIbpag0sIAq50RXV1iZbrO8VRn mBRVKOyPNbyGgUSPYQ8dMKx22kn+zw6UEWTlksTjYa5I9YeDKPtPy0xg+jC0AtLaKCrj aTVxkVw3mCg3cnqIceqp9Ydphhi9AHxybyoNgxZe6s4JaunMGNOoGzLrGWlrqFNSsLkC nBCPwFc4ZmfaaJyWmceBFo+J5fNKEC2Flgf/mDvgwtvteHPiWN69z9MU1BVm3C4sl6Ca pzRQ== MIME-Version: 1.0 X-Received: by 10.60.41.104 with SMTP id e8mr5268960oel.18.1403189571275; Thu, 19 Jun 2014 07:52:51 -0700 (PDT) Received: by 10.202.69.198 with HTTP; Thu, 19 Jun 2014 07:52:51 -0700 (PDT) In-Reply-To: References: Date: Thu, 19 Jun 2014 10:52:51 -0400 Message-ID: Subject: Re: Running Accumulo on the IBM JVM From: Benson Margulies To: "dev@accumulo.apache.org" Content-Type: text/plain; charset=UTF-8 X-Virus-Checked: Checked by ClamAV on apache.org On Thu, Jun 19, 2014 at 10:49 AM, Mike Drob wrote: > Hi Hayden! Welcome to Accumulo! > > Detailed responses are inline. > > Mike > > > On Thu, Jun 19, 2014 at 6:14 AM, Vicky Kak wrote: > >> Hi Hayden, >> >> Most of the recommendation looks okay to me since there are many change to >> be done I think you should go ahead and create main JIRA which would have >> multiple subtasks addressing all the changes. >> I am almost sure that you might get into similar kind of issue if you run >> other java based NoSql distributions i.e. HBase/Cassandra on IBM jdk, I >> personally had surprises in api calls related to ordering in my application >> a long back ago. Your observations looks reasonable to me. >> >> Regards, >> Vicky >> >> >> On Thu, Jun 19, 2014 at 3:47 PM, Hayden Marchant >> wrote: >> >> > Hi there, >> > >> > I have been working on getting Accumulo running on IBM JDK, as >> preparation >> > of including Accumulo in an upcoming version of BigInsights (IBM's Hadoop >> > distribution). I have come across a number of issues, to which I have >> made >> > some local fixes in my own environment. Since I'm a newbie in Accumulo, I >> > wanted to make sure that the approach that I have taken for resolving >> > these issues is aligned with the design intent of Accumulo. >> > >> > Some of the issues are real defects, and some are instances in which the >> > assumption of Sun/Oracle JDK being the used JVM is hard-coded into the >> > source-code. >> > >> > I have grouped the issues into 2 sections - Unit test failures and >> > Sun-specific dependencies (though there is an overlap) >> > >> > 1. Unit Test failures - should run consistently no matter which OS, Java >> > vendor/version etc... >> > a. >> > >> > >> org.apache.accumulo.core.util.format.ShardedTableDistributionFormatterTest.testAggregate >> > . This fails on IBM JRE, since the test is asserting order of elements in >> > a HashMap. This consistently passes on Sun , and consistently fails on >> > Oracle. Proposal: Change ShardedTableDistributionFormatter.countsByDay to >> > TreeMap >> > > This is probably a real defect. We should not be asserting order on a > HashMap. Another possible solution is to change the test to check for > unordered elements - HamCrest matchers may be useful here. You don't want to slow down the production code just to make a test case pass, that's for sure. If order is not part of the contract, do like Mike says, or copy it out and sort it. > > >> > >> > b. >> > >> > >> org.apache.accumulo.core.security.crypto.BlockedIOStreamTest.testGiantWrite. >> > This test assumes a max heap of about 1GB. This fails on IBM JRE, >> > since the default max heap is not specified, and on IBM JRE this depends >> > on the OS (see >> > >> > >> http://www-01.ibm.com/support/knowledgecenter/SSYKE2_6.0.0/com.ibm.java.doc.diagnostics.60/diag/appendixes/defaults.html?lang=en >> > ). >> > Proposal: add -Xmx1g to the surefire maven plugin reference in >> > parent maven pom. >> > >> > This might be https://issues.apache.org/jira/browse/ACCUMULO-2774 > > >> > c. Both org.apache.accumulo.core.security.crypto.CrypoTest & >> > org.apache.accumulo.core.file.rfile.RFileTest have lots of failures due >> to >> > calls to SEcureRandom with Random Number Generator Provider hard-coded as >> > Sun. The IBM JRE has it's own built in RNG Provider called IBMJCE. 2 >> > issues - hard-coded calls to SecureRandom.getInstance(,"SUN") and >> > also default value in Property class is "SUN". >> > Proposal: Add mechanism to override default Property through >> > System property through new annotator in Property class. Only usage will >> > be by Property.CRYPTO_SECURE_RNG_PROVIDER >> >> >> > I'm not sure about adding new annotators to Property. However, the > CryptoTest should be getting the value from the conf instead of hard-coding > it. Then you can specify the correct value in accumulo-site.xml > > I think another part of the issue is in > CryptoModuleFactory::fillParamsObjectFromStringMap because it looks like > that ignores the default setting. > >> > >> > 2. Environment/Configuration >> > a. The generated configuration files contain references to GC >> > params that are specific to Sun JVM. In accumulo-env.sh, the >> > ACCUMULO_TSERVER_OPTS contains -XX:NewSize and -XX:MaxNewSize , and also >> > in ACCUMULO_GENERAL_OPTS, >> > -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 are used. >> > b. in bin/accumulo, get ClassNotFoundException due to >> > specification of JAXP Doc Builder: >> > >> > >> -Djavax.xml.parsers.DocumentBuilderFactory=com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl >> > . >> > The Sun implementation of Document Builder Factory does not >> exists >> > in IBM JDK, so a ClassNotFoundException is thrown on running accumulo >> > script >> > >> > c. MiniAccumuloCluster - in the MiniAccumuloClusterImpl, >> > Sun-speciifc GC params are passed as params to the java process (similar >> > to section a. ) >> > >> > Single proposal for solving all three above issues: >> > Enhance bootstrap_config.sh with request to select Java vendor. >> > Selecting this will set correct values for GC params (they differ between >> > IBM and Sun), inclusion/ommision of JAXP setting. The >> > MiniAccumuloClusterImpl can read the same env variable that was set in >> > code for the GC Params, and use in the exec command. >> > >> > I don't know enough about the IBM JDK to comment on this part > intelligently. Go ahead and generate a patch, and we can use that as a > starting point for discussion. > >> > >> > So far, my work has been focused on getting unit tests working for all >> > Java vendors in a clean manner. I have not yet run intensive testing of >> > real clusters following these changes, and would be happy to get pointers >> > to what else might need treatment. >> >> >> > Unit tests is a good first pass. Integration tests (mvn verify) is probably > the minimum that you want on your continuous integration once you have > things set up. > > Accumulo also comes with a set of longer running, cluster based tests, > since we know that there are some pieces too complex for unit tests to > catch. have a look in the test module for the Continuous Ingest test. Once > you get to that point, we can help you set it up if the README is unclear. > >> I would also like to hear if these changes make sense, and if so, should >> > I go ahead and create some JIRAs, and attach my patches for commit >> > approval? >> > >> > Filing JIRAs is going to be the most straightforward path, yes. > > > Looking forward to hearing feedback! >> > >> > Regards, >> > Hayden Marchant >> > Software Architect >> > IBM BigInsights, IBM >> > >>