accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <>
Subject Re: GSOC: Monitor improvements - draft proposal
Date Tue, 30 Apr 2013 02:26:12 GMT

Thanks for the draft! Some feedback -- hopefully it's useful for your 
proposal in addition to giving you a better understanding of how 
Accumulo is typically run.

"These servers perform different functionalities"

Actually, most serversin an Accumulo cluster are identical to one 
another: most are running a TabletServer, and in <1.5, a Logger. The 
exceptions are the Master, Monitor, Tracer and GarbageCollector. Master, 
monitor and gc are typically run on the same node (monitor and gc are 
rather lightweight). Running a tracer on every TabletServer is probably 
overkill, but, again, this is another lightweight process, so not 
outside the realm of possibilities.

"Create a JMX API for Monitor to gather statistics"

Any plans to include an example 3rd-party monitor that takes advantage 
of the internal change from Thrift to JMX? If so, which? I could see 
this being very useful for your own verification and validation, not to 
mention for 3rd parties (people other than yourself).

"Table Graphs"

I'd be rather interested to see how the amount of data being returned by 
a TabletServer correlates with query rate. It would be a neat plot to 
see how RFile index size and size of each key-value returned corresponds 
with query rate. Maybe it would be cool to have the ability to let users 
create composite graphs?

"Trace Visualization"

Not a lot to really see here. Currently you get some rudimentary 
information about how long it took to determine which files to delete, 
and how long deleting them took (I think). It would be nice to see this 
broken down by table, and include file size and other file metadata.

"Server Status Information"

I remember hearing that someone had done some work to actually pop a 
shell in the monitor when authenticated over HTTPS. Another cool feature 
might be to actually have some greater insight into a node (perhaps 
using JMX calls that we wouldn't want publicly available) when properly 
authenticated? I'm thinking about being able to view the list of running 
scans on a node... being able to introspect the actual scan 
options/data, ranges being run, etc.

"Mock Stats Collector"

I would put money that this will pay off in spades as you move forward 
testing things.

Some more high-level things...

* Any thought/preference on the JMX library you would want to use?
* Re: Javascript, might want to look at DataTables (jQuery-based), 
d3.js, and/or nvd3. Lots of options here, but licensing can be a 
concern. Glad you thought about that already.

"Deliverables and Timeline"

I'd try to rethink your timeline a bit; it comes off very waterfall-y to 
me. The biggest red-flag to me is the "write documentation" as your last 
phase. Coming from experience, this doesn't work 95% of the time. 
Something else always comes up, takes longer, w/e and suddenly you have 
some code that you just got working and no documentation. I know it's 
difficult to create a development schedule when you're not completely 
familiar with what will be required of you, but trying to lay out the 
work in such a way that you have some concrete, measurable results after 
each phase will help you and, I believe, make a much more realistic 
schedule (not to mention make the advisor's job easier to see progress :P).

I hope this helps in one way or another.

- Josh

On 4/29/2013 10:46 AM, Supun Kamburugamuva wrote:
 > Hi all,
 > Here is the draft proposal for the Monitor Improvements project.
 > I would really appreciate your feedback.
 > Cheers,
 > Supun..

View raw message