accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benson Margulies <bimargul...@gmail.com>
Subject Re: Running Accumulo on the IBM JVM
Date Thu, 19 Jun 2014 14:52:51 GMT
On Thu, Jun 19, 2014 at 10:49 AM, Mike Drob <mdrob@apache.org> wrote:
> Hi Hayden! Welcome to Accumulo!
>
> Detailed responses are inline.
>
> Mike
>
>
> On Thu, Jun 19, 2014 at 6:14 AM, Vicky Kak <vicky.kak@gmail.com> wrote:
>
>> Hi Hayden,
>>
>> Most of the recommendation looks okay to me since there are many change to
>> be done I think you should go ahead and create main JIRA which would have
>> multiple subtasks addressing all the changes.
>> I am almost sure that you might get into similar kind of issue if you run
>> other java based NoSql distributions i.e. HBase/Cassandra on IBM jdk, I
>> personally had surprises in api calls related to ordering in my application
>> a long back ago. Your observations looks reasonable to me.
>>
>> Regards,
>> Vicky
>>
>>
>> On Thu, Jun 19, 2014 at 3:47 PM, Hayden Marchant <HAYDEN@il.ibm.com>
>> wrote:
>>
>> > Hi there,
>> >
>> > I have been working on getting Accumulo running on IBM JDK, as
>> preparation
>> > of including Accumulo in an upcoming version of BigInsights (IBM's Hadoop
>> > distribution). I have come across a number of issues, to which I have
>> made
>> > some local fixes in my own environment. Since I'm a newbie in Accumulo, I
>> > wanted to make sure that the approach that I have taken for resolving
>> > these issues is aligned with the design intent of Accumulo.
>> >
>> > Some of the issues are real defects, and some are instances in which the
>> > assumption of Sun/Oracle JDK being the used JVM is hard-coded into the
>> > source-code.
>> >
>> > I have grouped the issues into 2 sections -  Unit test failures and
>> > Sun-specific dependencies (though there is an overlap)
>> >
>> > 1. Unit Test failures - should run consistently no matter which OS, Java
>> > vendor/version etc...
>> >         a.
>> >
>> >
>> org.apache.accumulo.core.util.format.ShardedTableDistributionFormatterTest.testAggregate
>> > . This fails on IBM JRE, since the test is asserting order of elements in
>> > a HashMap. This consistently passes on Sun , and consistently fails on
>> > Oracle. Proposal: Change ShardedTableDistributionFormatter.countsByDay to
>> > TreeMap
>>
>
> This is probably a real defect. We should not be asserting order on a
> HashMap. Another possible solution is to change the test to check for
> unordered elements - HamCrest matchers may be useful here.

You don't want to slow down the production code just to make a test
case pass, that's for sure. If order is not part of the contract, do
like Mike says, or copy it out and sort it.

>
>
>> >
>> >         b.
>> >
>> >
>> org.apache.accumulo.core.security.crypto.BlockedIOStreamTest.testGiantWrite.
>> >         This test assumes a max heap of about 1GB. This fails on IBM JRE,
>> > since the default max heap is not specified, and on IBM JRE this depends
>> > on the OS (see
>> >
>> >
>> http://www-01.ibm.com/support/knowledgecenter/SSYKE2_6.0.0/com.ibm.java.doc.diagnostics.60/diag/appendixes/defaults.html?lang=en
>> > ).
>> >         Proposal: add -Xmx1g to the surefire maven plugin reference in
>> > parent maven pom.
>> >
>>
> This might be https://issues.apache.org/jira/browse/ACCUMULO-2774
>
>
>>  >         c. Both org.apache.accumulo.core.security.crypto.CrypoTest &
>> > org.apache.accumulo.core.file.rfile.RFileTest have lots of failures due
>> to
>> > calls to SEcureRandom with Random Number Generator Provider hard-coded as
>> > Sun. The IBM JRE has it's own built in RNG Provider called IBMJCE. 2
>> > issues - hard-coded calls to SecureRandom.getInstance(<algo>,"SUN") and
>> > also default value in Property class is "SUN".
>> >         Proposal: Add mechanism to override default Property through
>> > System property through new annotator in Property class. Only usage will
>> > be by Property.CRYPTO_SECURE_RNG_PROVIDER
>>
>>
>>
> I'm not sure about adding new annotators to Property. However, the
> CryptoTest should be getting the value from the conf instead of hard-coding
> it. Then you can specify the correct value in accumulo-site.xml
>
> I think another part of the issue is in
> CryptoModuleFactory::fillParamsObjectFromStringMap because it looks like
> that ignores the default setting.
>
>>  >
>> > 2. Environment/Configuration
>> >         a. The generated configuration files contain references to GC
>> > params that are specific to Sun JVM. In accumulo-env.sh, the
>> > ACCUMULO_TSERVER_OPTS contains -XX:NewSize and -XX:MaxNewSize , and also
>> > in ACCUMULO_GENERAL_OPTS,
>> > -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 are used.
>> >         b. in bin/accumulo, get ClassNotFoundException due to
>> > specification of JAXP Doc Builder:
>> >
>> >
>> -Djavax.xml.parsers.DocumentBuilderFactory=com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl
>> > .
>> >         The Sun implementation of Document Builder Factory does not
>> exists
>> > in IBM JDK, so a ClassNotFoundException is thrown on running accumulo
>> > script
>> >
>> >         c. MiniAccumuloCluster - in the MiniAccumuloClusterImpl,
>> > Sun-speciifc GC params are passed as params to the java process (similar
>> > to section a. )
>> >
>> >         Single proposal for solving all three above issues:
>> >         Enhance bootstrap_config.sh with request to select Java vendor.
>> > Selecting this will set correct values for GC params (they differ between
>> > IBM and Sun), inclusion/ommision of JAXP setting. The
>> > MiniAccumuloClusterImpl can read the same env variable that was set in
>> > code for the GC Params, and use in the exec command.
>> >
>>
> I don't know enough about the IBM JDK to comment on this part
> intelligently. Go ahead and generate a patch, and we can use that as a
> starting point for discussion.
>
>>  >
>> >  So far, my work has been focused on getting unit tests working for all
>> > Java vendors in a clean manner. I have not yet run intensive testing of
>> > real clusters following these changes, and would be happy to get pointers
>> > to what else might need treatment.
>>
>>
>>
> Unit tests is a good first pass. Integration tests (mvn verify) is probably
> the minimum that you want on your continuous integration once you have
> things set up.
>
> Accumulo also comes with a set of longer running, cluster based tests,
> since we know that there are some pieces too complex for unit tests to
> catch. have a look in the test module for the Continuous Ingest test. Once
> you get to that point, we can help you set it up if the README is unclear.
>
>>  I would also like to hear if these changes make sense, and if so, should
>> > I go ahead and create some JIRAs, and attach my patches for commit
>> > approval?
>> >
>>
> Filing JIRAs is going to be the most straightforward path, yes.
>
>  >  Looking forward to hearing feedback!
>> >
>> >  Regards,
>> >  Hayden Marchant
>> >  Software Architect
>> >  IBM BigInsights, IBM
>> >
>>

Mime
View raw message