accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Newton <eric.new...@gmail.com>
Subject Re: A couple of anomalies
Date Sun, 15 Jul 2012 18:33:54 GMT
The monitor program, which runs the jetty server, is independent of
the master.  Your master can be down, and the page will still refresh.

But almost all of the data comes from the master, which is collecting
stats for everything, primarily to support load balancing.

The difference between the pages are really just different summaries
of the data coming from the master.  You can get the same data using
the GetMasterStats class.  There's a thrift RPC call which pulls the
current data.

The JMX data is really a completely separate monitoring interface,
which is not used by the monitor.  But all the same data should be
available.

The benefit of the monitor is that it has been instrumented to
highlight unusual or unexpected values and conditions.  When I'm
providing support to other teams (typically by phone), it's a great
way for getting critical information quickly.

-Eric

On Sat, Jul 14, 2012 at 9:21 PM, Roger Lloyd <roger.lloyd.rl@gmail.com> wrote:
> Yeah, so it seems that our number one mistake is taking the Master down in
> response to having issues.  I guess you get so comfortable bringing the
> cluster up and down when you are first starting that it seems like a natural
> knee jerk reaction.  This most recent time there was something in
> yellow/red, but I don't recall what it said and it didn't seem to make sense
> to me, so since I was having problems with the web console and not sure the
> actual state of the Master, I just tried to stop it.  When it pushed back on
> shutting down (running stop-all.sh) something about access denied, I
> cancelled out of the shutdown script - so who knows on where it ended up.
>
> Could you explain a little more about the Master's monitoring console?  It
> runs an embedded Jetty instance and renders data from JMX MBeans from the
> running Master?  I know there is an XML representation, and I thought I saw
> something about embedding it in a separate JMX console (or maybe it is
> blurring with my read on the ZK and Hadoop reading), but is there a data
> store that holds that data, is it accessible by some other means if the web
> console isn't responding?
>
> On Sat, Jul 14, 2012 at 8:09 PM, Eric Newton <eric.newton@gmail.com> wrote:
>>
>> Is there anything red  or yellow on the monitor pages?
>>
>> There's a layering to availability:
>>
>> Most of the monitoring is done via the master, so if it has recently
>> restarted, you will see almost no useful information.
>>
>> The first tablet of the METADATA table needs to be assigned, recovered
>> and functional.  If you see only one tablet assigned... it needs to be
>> healthy before anything else can happen.
>>
>> Next, the rest of the METADATA table needs to be assigned, recovered
>> and functional.
>>
>> If you are seeing "-" then the METADATA table is not available for some
>> reason.
>>
>> Ensure that hadoop & zookeeper are not using /tmp for storage.
>>
>> -Eric
>>
>> On Sat, Jul 14, 2012 at 7:18 PM, Roger Lloyd <roger.lloyd.rl@gmail.com>
>> wrote:
>> > I was looking for some insights in regards to a couple of issues I have
>> > seen, and the likely cause/solution.
>> >
>> > 1)  Tables go blank
>> >
>> > So, everything kicking along fine, I am loading data, works beautifully
>> > for
>> > days even weeks adding hundreds of millions of entries, splitting
>> > tablets,
>> > etc. - just smooth.  Suddenly, I run into an issue where under the web
>> > console all the tables all just go to "-" for their values (except the
>> > !METADATA table).
>> >
>> > What could/would cause this?
>> >
>> > What is the smart way to react?  Our previous attempts have been 1)
>> > re-init
>> > and reload through the Client API and 2) re-init and recover the tables
>> > using the bulk loading scheme mentioned in this mailing list.  Not sure
>> > that
>> > we haven't taken more rash action than necessary, simply because we
>> > could
>> > afford to reload, etc.  When we increase our deployment, that will be
>> > less
>> > of an option.  Not sure what we are doing something wrong overall.
>> >
>> > 2) Client connections to Zookeeper
>> >
>> > When I am writing a client in Eclipse, we seem to have this issue where
>> > it
>> > cycles connections creating and closing sessions (with no errors at
>> > all),
>> > but if I suspend the thread in Eclipse and start it again, then the
>> > session
>> > opens and stays open.  I realize this is probably a Zookeeper problem,
>> > but
>> > can someone give me a quick run down of what is happening there under
>> > the
>> > hood, so I could try running some zKCli commands to simulate the issue?
>> >
>> > We are running version: 1.4.0-incubating-SNAPSHOT and Zookeeper 3.4.3.
>> > If
>> > we wanted to upgrade to 1.4.1, how involved would that be?  Just replace
>> > the
>> > jar files and the config files?  Or would we need to migrate data?
>> >
>> > Thanks for your help.
>> >
>> > Roger
>
>

Mime
View raw message