accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: 1.7 release timeline
Date Wed, 08 Oct 2014 03:06:04 GMT
Some more information on the subject. A few of us got together to 
co-work today and had an informal discussion on our individual interests 
for 1.7. Summary incoming:

*Monitor re-write*
   - I was pushing this one, I think the monitor still has merit despite 
the goal of the desire of other to just integrate with external systems
   - I have some code in place, but still needs more work.
   - Is a unified/stable "metrics" API necessary for integration w/ 
external tools? (or is JMX enough?)
   - An API would probably be a more usable interface than JMX
   - Such an API should be stateless (no log aggregation nor statistics 
over time)
   - Monitor still has uses for standalone/small deployments
   - If still being used, MVC approach would ease testing and addition 
of new data and views
   - Not necessary to hold up 1.7.0 from happening

*Revisit performance*
   - Eric mentioned that he wants to spend some time running some 
Accumulo benchmarks, specifically YCSB.
   - Lots of related topics were mentioned that might be relevant
     * Other HDFS block cache implementations (HBase has lots of nice 
benchmarks, could learn from them)
     * A WIP patch for metadata updates have some promise (ACCUMULO-2889)
     * Collapse iterator stack (ACCUMULO-3079)
     * Possible improvements to Scanner for single-batch cases (reduce a 
few RPCs to one RPC)
   - Actual changes made likely to be found via investigation
   - Changing default conf values where relevant also mentioned

*Distributed Tracing*
   - Billie has been spending some time working w/ some people on 
replacing Cloudtrace with HTrace
   - Mentioned that HTrace shares a remarkable amount of similarity with 
our existing tracing library
   - Upstream efforts in Hadoop-3 to integrate htrace to DN/NN calls
   - Some consideration given to replace traceserver with zipkin however 
not required for the first implementation

*Decouple MiniAccumuloCluster from ITs*
   - Another one I've started working on
   - ITs are really great, we have a lot for really good cases
   - Running them against a real instance in infeasible right now
   - Would be good to express as many as possible in terms of only using 
Instance+Connector
   - Christopher mentioned possible benefit outside of tests to using 
the accumulo-maven-plugin as the "shim" between a real instance and a 
MiniAccumuloCluster
   - Some tests are written explicitly for MAC and must be ignored or 
run against a MAC when a real instance is available.

*Upgrade test script*
   - Keith mentioned there's some code from John McNamee that might help 
testing upgrade paths

*Hadoop Metrics2*
   - Metrics2 is the current library in use by Hadoop
   - Integration gives us a lot more flexibility, notably good 
integration with Ganglia provided (ACCUMULO-1817)
   - No one expressed interest in working on this directly (potential to 
slip)

*Deprecate MockAccumulo?*
   - Talked about this for 1.6, decided against
   - It's now 1.7. Is it time?
   - Remember, deprecate != removal

There are some outstanding things we need to investigate more:
   - Is improved JMX or metrics2 impl sufficient for integration with 
external monitoring tools? (considerations: nagios, ganglia, statsd, 
collectd, carbon, riemann... others?)
   - BatchWriter has some weird cases around error handling. Is intended 
that it survives failures, but that's very much not the case. Should 
probably be fixed around a major release, but need to figure out how 
exactly to fix it (needs someone to get behind it)

If people want to continue discussion on these, let's break off 
individual topics into their own thread for clarity (and my sanity).

Also, anyone have a desire to be "release manager"?

- Josh

Josh Elser wrote:
> Thanks, John.
>
> I was thinking about trying to gun for January time-frame for a release.
> I'd love to say before 2014 is over, but that probably just won't happen
> for a major release with the holidays.
>
> For 1.7 right now, I see the following "bigger" items (correct me where
> I'm wrong):
>
> * Replication (done)
> * Upgrade rules/guarantees (proposed)
> * Replace cloudtrace (in-progress)
> * Rewrite monitor, include REST service (in-progress)
> * Drop Hadoop 1 support (proposed)
> * Decouple MiniAccumulo from ITs (in-progress)
> * Other minicluster types: in-process, shim to real instance (in-progress)
> * Support Hadoop metrics2 (proposed)
> * A few WAL/metadata related performance improvements (in-progress)
>
> Also, would be good to check the In-Progress state issues on JIRA. What
> do people think?
>
> John Vines wrote:
>> Moving this to it's own thread...
>>
>> On Mon, Oct 6, 2014 at 5:54 PM, Mike Drob<madrob@cloudera.com> wrote:
>>
>>> Related: Do we have a release timeline for 1.7?
>>>

Mime
View raw message