hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Marc Spaggiari <jean-m...@spaggiari.org>
Subject Re: [VOTE] The 1st hbase 0.94.15 release candidate is available for download
Date Fri, 20 Dec 2013 12:23:56 GMT
I agree. Nothing new with this release.

For the disks space, I have 370GB on the drive where the test is running
and 20GB in the tmp folder. I monitored that over the process and it used
only 1GB on both disks. So I don't think it's space related. That's strange
because I tried on 2 deferent config (hardware, os, etc.) and both times
got the same result. I will retry on a 3rd computer to validate.


2013/12/19 lars hofhansl <larsh@apache.org>

> Thanks again, JM.
>
>
> Yep, both IntegrationTestLoadAndVerify and IntegrationTestBigLinkedList
> pass for me in local install every time I run it (many times by now). JDK
> 1.6.0_34-b04.
>
> One thing I found is that they do not clean up their data and fill up the
> disk, once the disk is full the tests simply time out for me, but they
> could fail in more "interesting ways" too when that happens... Maybe that's
> what you see?
>
>
> In any case nothing new with this release, right? Need to double-check the
> tests.
>
> -- Lars
>
>
>
> ----- Original Message -----
> From: Jean-Marc Spaggiari <jean-marc@spaggiari.org>
> To: lars hofhansl <larsh@apache.org>; dev <dev@hbase.apache.org>
> Cc:
> Sent: Thursday, December 19, 2013 10:45 AM
> Subject: Re: [VOTE] The 1st hbase 0.94.15 release candidate is available
> for download
>
> For the version, issue is the alter command I used. Sorry about that.
> Forget it.
>
> For IntegrationTestLoadAndVerify I have already reported the issue with
> 0.94.10 on July 23rd.
>
> Just retried with 0.94.14 and 0.94.13 and failed on both too. By failed I
> mean they give me  REFERENCES_CHECKED=9855773 instead of a 100000000. Are
> you getting 100000000?
>
> Single node it a local install on my laptop. No other HBase instances
> configured, using local file system. For the 7 node cluster it's using
> Hadoop 1.0.4
>
> In local mode I'm running with jdk 1.6.0_45. On the 7 nodes I'm running
> 1.7.0_5
>
> What's strange with the abstract issue is that IntegrationTestsDriver is
> not the only one using ToolRunner, but is the only one to fail. Strange.
>
> JM
>
>
>
> 2013/12/19 lars hofhansl <larsh@apache.org>
>
> > The single node cluster was just a local install, right? I.e. using the
> > local file system, rather than HDFS...?
> > On the 7 node cluster, which version of HDFS did you use? If not 1.0.4 I
> > assume you recompiled HBase :)
> >
> > I definitely do not see the AbstractMethodError issue. That very looks
> > like a classpath setup issue.
> >
> > Ran IntegrationTestLoadAndVerify and IntegrationTestBigLinkedList in a
> > loop in local mode. Didn't fail once.
> >
> > Let's chat offline and figure out if/where your setup is different from
> > mine.
> >
> > -- Lars
> >
> > ________________________________
> > From: lars hofhansl <larsh@apache.org>
> > To: Jean-Marc Spaggiari <jean-marc@spaggiari.org>; dev <
> > dev@hbase.apache.org>
> > Sent: Thursday, December 19, 2013 8:53 AM
> > Subject: Re: [VOTE] The 1st hbase 0.94.15 release candidate is available
> > for download
> >
> >
> > Thanks JM.
> >
> >
> > You did a "raw" scan below. It'll return to you exactly what is there, so
> > you'll see the 3 versions before you compact, that is by design.
> > java.lang.AbstractMethodError looks like an issue local to your install.
> > I'll check.
> >
> >
> > IntegrationTestLoadAndVerify is interesting. Did that pass reliably in
> > older releases of 0.94 (0.94.14 or 0.94.13)?
> >
> > -- Lars
> >
> >
> > ________________________________
> >
> > From: Jean-Marc Spaggiari <jean-marc@spaggiari.org>
> > To: dev <dev@hbase.apache.org>; lars hofhansl <larsh@apache.org>
> > Sent: Thursday, December 19, 2013 7:01 AM
> > Subject: Re: [VOTE] The 1st hbase 0.94.15 release candidate is available
> > for download
> >
> >
> >
> > tl;tr see arrow below.
> >
> >
> >
> > Downloaded and checked signature for bother vanilla and secured. Passed.
> > Random checked documentation and CHANGES.txt. Passed
> >
> >
> > On a single node cluster:
> > Ran the tests. All passed.
> > Ran IntegrationTestLoadAndVerify. Got  REFERENCES_CHECKED=9855424,
> > expected 10000000? Failed?
> > Ran IntegrationTestBigLinkedList. Passed.
> > Ran HBCK after those tests and got many errors about _original-evil-name
> > and clone tables.
> > Cleared everything, restarted HBase. Re-ran IntegrationTestBigLinkedList,
> > HBCK ok. Re-ran IntegrationTestLoadAndVerify, failed again:
> > 13/12/18 21:24:24 ERROR test.IntegrationTestBigLinkedList$Verify:
> Expected
> > referenced count does not match with actual referenced count. expected
> > referenced=3000000 ,actual=9000000
> > Exception in thread "main" java.lang.RuntimeException: Verify.verify
> failed
> >     at
> >
> org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Loop.runVerify(IntegrationTestBigLinkedList.java:724)
> >     at
> >
> org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Loop.run(IntegrationTestBigLinkedList.java:757)
> >     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >     at
> >
> org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList.run(IntegrationTestBigLinkedList.java:1069)
> >     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >     at
> >
> org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList.main(IntegrationTestBigLinkedList.java:1073)
> >
> > But now HBCK is clean. Figured that HBCK issue is because of some
> leftover
> > from org.apache.hadoop.hbase.regionserver.TestStoreFile who is writting
> in
> > the same directory as the default standalone HBase.
> >
> > From the shell, create a table 15 regions, put, compact, scan, etc. Table
> > definition is VERSIONS => 2. However, scan 't1', {RAW => true, VERSIONS
> =>
> > 10} still return 3 versions even after flush/compact/major_compact:
> > hbase(main):034:0> scan 't1', {RAW => true, VERSIONS => 10}
> > ROW
> > COLUMN+CELL
> >  rowkey                                                   column=f1:c1,
> > timestamp=1387421969489,
> > value=value
> >  rowkey                                                   column=f1:c1,
> > timestamp=1387421969337,
> > value=value
> >  rowkey                                                   column=f1:c1,
> > timestamp=1387421969162,
> > value=value
> > 1 row(s) in 0.0570 seconds
> >
> > Will have expected only 2 to be return.
> >
> >
> >
> > Stopped HBase, checked the log, everything is fine.
> >
> >
> > Now on a 7 nodes cluster:
> >
> > Deployed jars and did rolling restart on a 0.94.14 cluster. Passed.
> >
> > Configured default balancer, merged a 60 region table to a single region,
> > restarted the cluster, all fine.
> >
> > major_compact the table to get it split into 60 regions, balancer, all
> > fine except that balancer need to be run twice to get correct balancing.
> >
> > Some "No serialized HRegionInfo in keyvalues" in the logs not related to
> > the tables I'm "playing" with.
> >
> > Restored customized balancer, restarted, rebalanced, all fine.
> > Ran IntegrationTestLoadAndVerify. Got  REFERENCES_CHECKED=9855645,
> > expected 10000000? Failed?
> >
> > Ran IntegrationTestBigLinkedList. Passed.
> >
> >
> > Last, I tried to run IntegrationTestsDriver but it failed. I need to look
> > at that.
> >
> > hbase@node3:~/hbase-0.94.3$ bin/hbase
> > org.apache.hadoop.hbase.IntegrationTestsDriver
> > Exception in thread "main" java.lang.AbstractMethodError:
> > org.apache.hadoop.hbase.util.AbstractHBaseTool.doWork()V
> >     at
> >
> org.apache.hadoop.hbase.util.AbstractHBaseTool.run(AbstractHBaseTool.java:103)
> >     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> >     at
> >
> org.apache.hadoop.hbase.IntegrationTestsDriver.main(IntegrationTestsDriver.java:47)
> >
> >
> >
> >
> > =====> tl;tr:
> >
> > - Small issue with balancer when 60 regions assigned to a single server.
> > Need to run twice to get that correctly balanced;
> >
> > - Leftover in the wrong place from
> > org.apache.hadoop.hbase.regionserver.TestStoreFile;
> > - Table with VERSIONS => 2 returns 3 versions instead of 2;
> > - IntegrationTestsDriver not running.
> >
> >
> > I don't think there is anything here to stop the release but there is
> > still few things that need to be looked at.
> >
> >
> > JM
> >
> >
> >
> >
> > 2013/12/18 lars hofhansl <larsh@apache.org>
> >
> > The 1st 0.94.15 RC is available for download at
> > http://people.apache.org/~larsh/hbase-0.94.15-rc0/
> > >Signed with my code signing key: C7CFE328
> > >
> > >HBase 0.94.15 is a bug fix release along with some performance
> > improvements:
> > >    [HBASE-7886] - [replication] hlog zk node will not be deleted if
> > client roll hlog
> > >    [HBASE-9485] - TableOutputCommitter should implement recovery if we
> > don't want jobs to start from 0 on RM restart
> > >    [HBASE-9995] - Not stopping ReplicationSink when using custom
> > implementation for the ReplicationSink
> > >    [HBASE-10014] - HRegion#doMiniBatchMutation rollbacks the memstore
> > even if there is nothing to rollback.
> > >    [HBASE-10015] - Replace intrinsic locking with explicit locks in
> > StoreScanner
> > >    [HBASE-10026] - HBaseAdmin#createTable could fail if region splits
> > too fast
> > >    [HBASE-10046] - Unmonitored HBase service could accumulate Status
> > objects and OOM
> > >    [HBASE-10057] - TestRestoreFlushSnapshotFromClient and
> > TestRestoreSnapshotFromClient fail to finish occasionally
> > >    [HBASE-10061] - TableMapReduceUtil.findOrCreateJar calls
> > updateMap(null, ) resulting in thrown NPE
> > >    [HBASE-10064] - AggregateClient.validateParameters can throw NPE
> > >    [HBASE-10089] - Metrics intern table names cause eventual permgen
> OOM
> > in 0.94
> > >    [HBASE-10111] - Verify that a snapshot is not corrupted before
> > restoring it
> > >    [HBASE-10112] - Hbase rest query params for maxVersions and
> maxValues
> > are not parsed
> > >    [HBASE-10117] - Avoid synchronization in
> > HRegionScannerImpl.isFilterDone
> > >    [HBASE-10120] - start-hbase.sh doesn't respect --config in
> > non-distributed mode
> > >    [HBASE-10179] - HRegionServer underreports readRequestCounts by 1
> > under certain conditions
> > >    [HBASE-10181] - HBaseObjectWritable.readObject catches
> > DoNotRetryIOException and wraps it back in a regular IOException
> > >    [HBASE-9931] - Optional setBatch for CopyTable to copy large rows in
> > batches
> > >    [HBASE-10001] - Add a coprocessor to help testing the performances
> > without taking into account the i/o
> > >    [HBASE-10007] - PerformanceEvaluation: Add sampling and latency
> > collection to randomRead test
> > >    [HBASE-10010] - eliminate the put latency spike on the new log file
> > beginning
> > >    [HBASE-10048] - Add hlog number metric in regionserver
> > >    [HBASE-10049] - Small improvments in region_mover.rb
> > >    [HBASE-10093] - Unregister ReplicationSource metric bean when the
> > replication source thread is terminated
> > >    [HBASE-9047] - Tool to handle finishing replication when the cluster
> > is offline
> > >    [HBASE-10119] - Allow HBase coprocessors to clean up when they fail
> > >    [HBASE-9927] - ReplicationLogCleaner#stop() calls
> > HConnectionManager#deleteConnection() unnecessarily
> > >    [HBASE-9986] - Incorporate HTTPS support for HBase (0.94 port)
> > >    [HBASE-10058] - Test for HBASE-9915 (avoid reading index blocks)
> > >    [HBASE-10189] - Intermittent TestReplicationSyncUpTool failure
> > >
> > >The list of changes is also available here:
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310753&version=12325559
> > >
> > >Here're the jenkins runs for this RC:
> > https://builds.apache.org/job/HBase-0.94.15/2/ and
> > https://builds.apache.org/job/HBase-0.94.15-security/1/
> > >
> > >Please try out the RC, check out the doc, take it for a spin, etc, and
> > vote +1/-1 by EOD December 27th on whether we should release this as
> > 0.94.15. (9 days because of the holidays)
> > >
> > >Thanks.
> > >
> > >-- Lars
> > >
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message