accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: GSOC: Monitor improvements - draft proposal
Date Fri, 03 May 2013 14:41:38 GMT
Great. Best of luck, Supun!

On 5/3/13 10:01 AM, Supun Kamburugamuva wrote:
> I've submitted the proposal to google.
>
> http://www.google-melange.com/gsoc/proposal/review/google/gsoc2013/supun06/1
>
> Thanks,
> Supun..
>
>
> On Tue, Apr 30, 2013 at 10:31 AM, Supun Kamburugamuva <supun06@gmail.com>wrote:
>
>> Hi Josh,
>>
>> Thanks for the detailed feedback. I've integrated your suggestions to the
>> document.
>>
>> For a third party monitoring tool I would like to use Zabbix. But I may
>> have to do more research on this one. For now I'll leave it with Zabbix.
>>
>> I couldn't find a library that helps with JMX development. I guess most
>> people use the Java API directly. Anyways I'll do more research on this one
>> and try to find one if possible.
>>
>> Thanks for the Javascrip library suggestions. d3.js seems impressive.
>>
>> I've updated the timeline and added a deliverable section. Hope this helps
>> a bit. Let me know if it needs further improvements. For each phase my plan
>> is to create a patch with the changes.
>>
>> Thanks,
>> Supun..
>>
>>
>>
>>
>> On Mon, Apr 29, 2013 at 10:26 PM, Josh Elser <josh.elser@gmail.com> wrote:
>>
>>> Supun,
>>>
>>> Thanks for the draft! Some feedback -- hopefully it's useful for your
>>> proposal in addition to giving you a better understanding of how Accumulo
>>> is typically run.
>>>
>>> "These servers perform different functionalities"
>>>
>>> Actually, most serversin an Accumulo cluster are identical to one
>>> another: most are running a TabletServer, and in <1.5, a Logger. The
>>> exceptions are the Master, Monitor, Tracer and GarbageCollector. Master,
>>> monitor and gc are typically run on the same node (monitor and gc are
>>> rather lightweight). Running a tracer on every TabletServer is probably
>>> overkill, but, again, this is another lightweight process, so not outside
>>> the realm of possibilities.
>>>
>>> "Create a JMX API for Monitor to gather statistics"
>>>
>>> Any plans to include an example 3rd-party monitor that takes advantage of
>>> the internal change from Thrift to JMX? If so, which? I could see this
>>> being very useful for your own verification and validation, not to mention
>>> for 3rd parties (people other than yourself).
>>>
>>> "Table Graphs"
>>>
>>> I'd be rather interested to see how the amount of data being returned by
>>> a TabletServer correlates with query rate. It would be a neat plot to see
>>> how RFile index size and size of each key-value returned corresponds with
>>> query rate. Maybe it would be cool to have the ability to let users create
>>> composite graphs?
>>>
>>> "Trace Visualization"
>>>
>>> Not a lot to really see here. Currently you get some rudimentary
>>> information about how long it took to determine which files to delete, and
>>> how long deleting them took (I think). It would be nice to see this broken
>>> down by table, and include file size and other file metadata.
>>>
>>> "Server Status Information"
>>>
>>> I remember hearing that someone had done some work to actually pop a
>>> shell in the monitor when authenticated over HTTPS. Another cool feature
>>> might be to actually have some greater insight into a node (perhaps using
>>> JMX calls that we wouldn't want publicly available) when properly
>>> authenticated? I'm thinking about being able to view the list of running
>>> scans on a node... being able to introspect the actual scan options/data,
>>> ranges being run, etc.
>>>
>>> "Mock Stats Collector"
>>>
>>> I would put money that this will pay off in spades as you move forward
>>> testing things.
>>>
>>> Some more high-level things...
>>>
>>> * Any thought/preference on the JMX library you would want to use?
>>> * Re: Javascript, might want to look at DataTables (jQuery-based), d3.js,
>>> and/or nvd3. Lots of options here, but licensing can be a concern. Glad you
>>> thought about that already.
>>>
>>> "Deliverables and Timeline"
>>>
>>> I'd try to rethink your timeline a bit; it comes off very waterfall-y to
>>> me. The biggest red-flag to me is the "write documentation" as your last
>>> phase. Coming from experience, this doesn't work 95% of the time. Something
>>> else always comes up, takes longer, w/e and suddenly you have some code
>>> that you just got working and no documentation. I know it's difficult to
>>> create a development schedule when you're not completely familiar with what
>>> will be required of you, but trying to lay out the work in such a way that
>>> you have some concrete, measurable results after each phase will help you
>>> and, I believe, make a much more realistic schedule (not to mention make
>>> the advisor's job easier to see progress :P).
>>>
>>> I hope this helps in one way or another.
>>>
>>> - Josh
>>>
>>>
>>> On 4/29/2013 10:46 AM, Supun Kamburugamuva wrote:
>>>> Hi all,
>>>>
>>>> Here is the draft proposal for the Monitor Improvements project.
>>>>
>>>> https://docs.google.com/**document/d/**1j1YHZJXuzxIrB1udt1RnWZUgZLeo-**
>>> JX711gEv1l--r8/edit#heading=h.**2r66wv56fsz<https://docs.google.com/document/d/1j1YHZJXuzxIrB1udt1RnWZUgZLeo-JX711gEv1l--r8/edit#heading=h.2r66wv56fsz>
>>>> I would really appreciate your feedback.
>>>>
>>>> Cheers,
>>>> Supun..
>>>>
>>>
>>
>> --
>> Supun Kamburugamuva
>> Member, Apache Software Foundation; http://www.apache.org
>> E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
>> Blog: http://supunk.blogspot.com
>>
>>
>


Mime
View raw message