hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: HTable.put hangs on bulk loading
Date Fri, 13 May 2011 15:04:32 GMT
>> when the number of the regions in this table rises (more than 100), they
are spread all over the cluster (good)
Can you clarify the above a bit more ?
If you use stock version 0.90.2, random selector wouldn't guarantee to
distribute the regions of this table.

On Fri, May 13, 2011 at 7:44 AM, Stan Barton <bartx007@gmail.com> wrote:

>
>
> stack-3 wrote:
> >
> > On Thu, Apr 28, 2011 at 6:54 AM, Stan Barton <bartx007@gmail.com> wrote:
> >>
> >> Yes, these high limits are for the user running the hadoop/hbase
> >> processes.
> >>
> >> The systems are ran on a cluster of 7 machines (1 master, 6 slaves). One
> >> processor, two cores and 3.5GB of memory. I am using about 800MB for
> >> hadoop
> >> (version CDH3B2) and 2.1GB for HBase (version 0.90.2). There is 6TB on
> >> four
> >> disks per machine. Three zookeepers. The database contains more than
> 3500
> >> regions and the table that was fed was already about 300 regions. The
> >> table
> >> was fed incrementally using HTable.put().  The data are documents with
> >> size
> >> ranging from few bytes to megabytes where the upper limit is set to 10MB
> >> per
> >> inserted doc.
> >>
> >
> > Are you swapping Stan?  You are close to the edge with your RAM
> > allocations.  What do you have swappyness set to?  Is it default?
> >
> > Writing you don't need that much memory usually but you do have a lot
> > of regions so you could be flushing a bunch, a bunch of small files.
> >
>
> Due to various problems with swap, the swap was turned off and the
> overcommitment of the memory was turned on.
>
>
> stack-3 wrote:
> >
> >> The configuration files:
> >>
> >> hadoop/core-site.xml http://pastebin.ca/2051527
> >> hadoop/hadoop-env.sh http://pastebin.ca/2051528
> >
> > Your HADOOP_CLASSPATH is a little odd.  You are doing * on jar
> > directories.  Does that work?
> >
> > This CLASSPATH  mentions nutch and a bunch of other stuff.  Are you
> > running just datanodes on these machines or tasktracers and mapreduce
> > too?
> >
> > These are old IA stock machines?  Do they have ECC RAM?  (IIRC, they
> > used to not have ECC RAM).
> >
>
> Strangely, on the machines and the debian installed, only this (star * )
> approach works. Originally, I was running the DB on the same cluster as the
> processing took place - mostly mapreduce jobs reading the data and doing
> some analysis. But when I started using nutchwax on the same cluster I
> started running out of memory (on the mapreduce side) and since the
> machines
> are so sensitive (no swap and overcommitment) that became a nightmare. So
> right now the nutch is being ran on a separate cluster - I have tweaked
> nutchwax to work with recent Hadoop apis and also to take the hbase stored
> content on as the input (instead of ARC files).
>
> The machines are somehow renovated old red boxes (I dont know what
> configuration they were originally). The RAM is not an ECC as far as I
> know,
> because the chipset on the motherboards does not support that technology.
>
>
> stack-3 wrote:
> >
> >> hadoop/hdfs-site.xml http://pastebin.ca/2051529
> >>
> >
> > Did you change the dfs block size?   Looks like its 256M rather than
> > usual 64M.  Any reason for that?  Would suggest going w/ defaults at
> > first.
> >
> > Remove dfs.datanode.socket.write.timeout == 0.  Thats an old config.
> > recommendation that should no longer be necessary and is likely
> > corrosive.
> >
>
> I have changed the size of the block, to diminish the overall number of
> blocks. I was following some advices regarding managing that large amount
> of
> data in HDFS that I found in the fora.
>
> As for the dfs.datanode.socket.write.timeout, that was set up because I was
> observing quite often timeouts on the DFS sockets, and by digging around, I
> have found out, that for some reason the internal java times were not
> aligned of the connecting machines (even though the hw clock were), I think
> there was a JIRA for that.
>
>
> stack-3 wrote:
> >
> >> hbase/hbase-site.xml http://pastebin.ca/2051532
> >
> > You are letting major compactions run every 24 hours.  You might want
> > to turn them off and then manage the major compactions to happen
> > during downtimes.  They have a knack of cutting in just when you don't
> > want them too; e.g. when you are under peak loading.
> >
> > You have upped the flush size above default; i.e.
> > hbase.hregion.memstore.flush.size.  This will put more pressure on RAM
> > when I'd think you would want to have less since you are poor where it
> > comes to RAM.
> >
> > You have upped your regionsize above default.   That is good I'd say.
> > You might want to 4x -- go to 4G -- since you are doing relatively big
> > stuff.
> >
> > You should send me the metrics that show on the top of the
> > regionserver UI when you are under load.  I'd like to see things like
> > how much of your RAM is given over to indices for these storefiles.
> >
> > I see you hdfs.block.size specified in here at 256M.  So stuff written
> > by hbase into hdfs will have a block size of 256M.  Any reason to do
> > this?  I'd say leave it at default unless you have a really good
> > reason to do otherwise (Remove this config. from this file).
> >
> >
> Again, the reason to upper the block size was motivated by the assumption
> of
> lowering the overall number of blocks. If it imposes stress on the RAM it
> makes sense to leave it on the defaults. I guess it also helps the
> parallelization.
>
>
>
> stack-3 wrote:
> >
> >
> >
> >> hbase/hbase-env.sh http://pastebin.ca/2051535
> >>
> >
> > Remove this:
> >
> > -XX:+HeapDumpOnOutOfMemoryError
> >
> > Means it will dump heap if JVM crashes.  This is probably of no
> > interest to you and could actually cause you pain if you have small
> > root file system if the heap dump causes you to fill.
> >
> > The -XX:CMSInitiatingOccupancyFraction=90 is probably near useless
> > (default is 92% or 88% -- I don't remember which).  Set it down to 80%
> > or 75% if you want it to actually make a difference.
> >
> > Are you having issues w/ GC'ing?  I see you have mslab enabled.
> >
> >
> On the version 0.20.6 I have seen long pauses during the importing phase
> and
> also when querying. I was measuring the how many queries were processed per
> second and could see pauses in the throughput. The only culprit I could
> find
> was the gc, but still could not figure out why it pauses the whole DB.
> Therefore I gave it a shot with mslab with 0.90, but I do still see those
> pauses in the throughput.
>
>
> stack-3 wrote:
> >
> >
> >> Because the nproc was high I had inspected the out files of the RSs' and
> >> found one which indicated that all the IPCs OOMEd, unfortunately I dont
> >> have
> >> those because they got overwritten after a cluster restart.
> >
> > This may have been because of  HBASE-3813 .  See if 0.90.3 helps
> > (There is an improvement here).
> >
> > Next time, let us see them.
> >
> >> So that means
> >> that it was OK on the client side. Funny is that all RS processes were
> up
> >> and running, only that one with OOMEd IPCs did not really communicate
> >> (after
> >> trying to restart the importing process no inserts went through).
> >
> > An OOME'd process goes wonky thereafter and acts in irrational ways.
> > Perhaps this was why it stopped taking on requests.
> >
> >> So the
> >> cluster seemed OK - I was storing statistics that were apparently served
> >> by
> >> another RS and those were also listed OK. As I mentioned, the log of the
> >> bad
> >> RS did not mention that anything wrong happened.
> >>
> >> My observation was: the regions were spread on all RSs but the crashed
> RS
> >> served most of them about a half more than any other, therefore was
> >> accessed
> >> the more than others. I have discussed the load balancing in HBase
> 0.90.2
> >> with Ted Yu already.
> >>
> >> The balancer needs to be tuned I guess because when the table is created
> >> and
> >> loaded from scratch, the regions of the table are not balanced equally
> >> (in
> >> terms of numbers) in the cluster and I guess the RS that hosted the very
> >> first region is serving the majority of servers as they are being split.
> >> It
> >> imposes larger load on that RS which is more prone to failures (like
> mine
> >> OOME) and kill performance.
> >>
> >
> > OK.  Yeah, Ted has been arguing that the balancer should be
> > table-conscious in that it should try and spread tables across the
> > cluster.  Currently its not.  All regions are equal in the balancer's
> > eyes.  0.90.2 didn't help?  (Others have since reported that they
> > think as Ted that the region a table comes from should be considered
> > when balancing).
> >
> >
> In fact, what I still see in 0.90.2 is that when I start inserting to empty
> table, when the number of the regions in this table rises (more than 100),
> they are spread all over the cluster (good) but one RS (the holding the
> first region) serves remarkably more regions than the rest of the RSs,
> which
> kills performance of the whole cluster and puts a lot of stress on this one
> RS (there was no RS downtime, and the overall region numbers are even on
> all
> RS).
>
>
> stack-3 wrote:
> >
> >
> >
> >> I have resumed the process with rebalancing the regions beforehand and
> >> was
> >> achieving higher data ingestion rate and also did not ran into the OOME
> >> with
> >> one RS. Right now I am trying to replay the incident.
> >>
> >> I know that my scenario would require better machines, but those are
> what
> >> I
> >> have now and am before production running stress tests. In comparison
> >> with
> >> 0.20.6 the 0.90.2 is less stable regarding the insertion but it scales
> >> sub-linearily (v0.20.6 did not scale on my data) in terms of random
> >> access
> >> queries (including multi-versioned data) - have done extensive
> comparison
> >> regarding this.
> >>
> >
> > 0.90.2 scales sub-linearly?  You mean its not linear but more machines
> > helps?
> >
> > Are you random reads truly random (I'd guess they are).  Would cache
> help?
> >
> > Have you tried hdfs-237 patch?  There is a version that should work
> > for you version of cdh.  It could a big difference (though, beware,
> > the latest posted patch does not do checksuming and if your hardware
> > does not have ECC, it could be a problem).
> >
> >
> I have done some test using random access queries and multiversioned data
> (10 to 50 different timestamps per data) and that the random access in
> v0.20.6 is degrading linearly with the number of versions, in the case of
> 0.90, some slow down was recorded but in sub-linear speed. Still while
> using
> the same amount of machines.
>
> The reads were random, I pre-selected the rows from the whole collection.
> The cache helped, I could see in the pattern the time it took to serve a
> query's answer from disk and from cache.
>
> Are you sure that you have suggested the right patch (hdfs-237)? It
> mentions
> dfsadmin... And no the machines do not have ECC enabled ram.
>
>
>
> stack-3 wrote:
> >
> >
> >
> > St.Ack
> >
> >
>
> Stan
> --
> View this message in context:
> http://old.nabble.com/HTable.put-hangs-on-bulk-loading-tp31338874p31612028.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message