hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pun Intended <punintende...@gmail.com>
Subject Re: Very high read rate in a single RegionServer
Date Tue, 27 Jan 2015 20:31:34 GMT
On Tue, Jan 27, 2015 at 12:58 PM, Stack <stack@duboce.net> wrote:

> On Tue, Jan 27, 2015 at 9:34 AM, Pun Intended <punintendedyo@gmail.com>
> wrote:
>
> > Hi There,
> >
> > Thank you for your reply. The tasks are processing a lot of data. Each RS
> > is hosting 15 regions. I have a total of ~4000 regions.
> >
>
> 4000/15 = ~270 servers?
>
Correct!


>
> >  * The biggest ones are ~10Gbs Snappy-compressed (I think that's pretty
> big
> > and these are the slow ones). There are ~1000 of these.
> >
>
> It may take a while to churn through these, yeah, given say 3-5x
> compression.
>
> >  * Then there are about 2000 5Gb-compressed regions.
> >  * About 500 2-3-Gb-compressed ones.
> >  * And about 500 0Gb ones. (:( not sure how these got created, maybe
> > over-splitting at some point way back).
> >
> >
> OK.  We should clean these up but probably not an issue at the moment.
>
Cool. That's what I thought too.


> > The distribution is not ideal. I can try to split the big ones and then
> > merge the empty ones.
> >
> >
> Your table is 'lumpy' because you disabled splitting?
>
Correct.

>
> The longest running tasks are taking 2-3hrs and from initial observation
> are running on the RegionServers hosting the big regions.
>
>
It would make sense given default split is by region.

You can get stats by region. What sort of monitoring setup do you have
here? What do you see for slow tasks? Are you doing lots of seeks on these
regions?  Are they CPU-bound? What sort of a MR job is it?  Lots of reading
or writing?

The majority of the MR jobs are doing heavy reads/scans. Also doing
bulk-loading twice as day. The slow tasks are the ones that do the reads.
These jobs are definitely CPU bound. I can see that the CPU time on
individual task level is close to the run time of the task.


> Unfortunately, I don't have very good data to see what the read rate on
the
> RS hosting the .META. was when tasks were running faster awhile back, but
I
> did decommission the node where it was running and the .META.
automatically
> moved to another RS and the read requests there spiked up very high right
> after it moved.
>
>
High read rate against hbase:meta is going to happen. How many MR tasks you
have running at any one time?  Each task on startup is going to go there to
figure out where the region it is to operate against is located.

Makes sense. Most of the time all 1400 mappers are doing work, reading
data. The 500 empty regions are probably just also adding unnecessary reads
to the META. Will try to get rid of these at some point.

> * Do you think the size and number of regions may be the real issue here?

The issue being that your jobs are taking longer?
Yes.

Sounds like a few of your tasks are taking longer than other tasks to
complete, probably because these regions are bigger than others?  Is that
so?  Can you take a look at recent runs to see which tasks are stragglers
and then figure which region they were?  If big ones, then you could try
splitting these regions (since it seems like you have it disabled) so their
processing gets divided across more MR tasks.  But perhaps it always a
particular region and it has a really big row or some other sort of anomaly
that is taking time to process.

That's true. The tasks are slow on a bunch of hosts, where the RSs host big
regions. How can I see exactly which region served the task?!


> * Do you know if there is a way to host the .META. region on a dedicated
> machine?
>
>
There is not a means for doing this.  Is the high read rate to meta an
actual problem?

Not sure. It could very well be a red herring.

St.Ack

Thank you very much for the help and your methodical approach!


> Thanks in advance!
>
>
> On Mon, Jan 26, 2015 at 11:40 PM, Stack <stack@duboce.net> wrote:
>
> > On Sat, Jan 24, 2015 at 5:15 PM, Pun Intended <punintendedyo@gmail.com>
> > wrote:
> >
> > > Hello,
> > >
> > > I have noticed lately that my apps started running longer.
> >
> >
> > You are processing more data now?
> >
> >
> >
> > > The longest
> > > running tasks seem all to be requesting data from a single region
> server.
> > >
> >
> > Has the rate at which you access hbase:mta gone up since when the job
ran
> > faster? Anything changed in your processing?  Is the trip to hbase:meta
> > what is slowing your jobs (you could add some printout in your maptask
> but
> > meta looks should be fast usually out of cache). How long do the tasks
> > last?  Are they short or long?  If long running, then they'll build up a
> > local cache of locations and won't have to go to hbase:meta.
> >
> > St.Ack
> >
> >
> >
> > > That region server read rate is very high in comparison to the read
> rate
> > of
> > > all the other region servers (1000reqs/sec vs 4-5 reqs/sec elsewhere).
> > That
> > > region server has about the same number of regions as all the rest:
> 26-27
> > > regions. Number of store files, total region size, everything else on
> the
> > > region server seems ok and in the same ranges as the rest of the
region
> > > servers. The keys should be evenly distributed - randomly generated
> > > 38-digit numbers. I am doing a simple Hbase scan from all my MR jobs.
> > >
> > > I'd appreciate any suggestions on what to look into it or if you have
> any
> > > ideas how I can solve this issue.
> > >
> > > Thanks!
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message