hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stan Barton <bartx...@gmail.com>
Subject Re: HTable.put hangs on bulk loading
Date Fri, 13 May 2011 14:44:02 GMT


stack-3 wrote:
> 
> On Thu, Apr 28, 2011 at 6:54 AM, Stan Barton <bartx007@gmail.com> wrote:
>>
>> Yes, these high limits are for the user running the hadoop/hbase
>> processes.
>>
>> The systems are ran on a cluster of 7 machines (1 master, 6 slaves). One
>> processor, two cores and 3.5GB of memory. I am using about 800MB for
>> hadoop
>> (version CDH3B2) and 2.1GB for HBase (version 0.90.2). There is 6TB on
>> four
>> disks per machine. Three zookeepers. The database contains more than 3500
>> regions and the table that was fed was already about 300 regions. The
>> table
>> was fed incrementally using HTable.put().  The data are documents with
>> size
>> ranging from few bytes to megabytes where the upper limit is set to 10MB
>> per
>> inserted doc.
>>
> 
> Are you swapping Stan?  You are close to the edge with your RAM
> allocations.  What do you have swappyness set to?  Is it default?
> 
> Writing you don't need that much memory usually but you do have a lot
> of regions so you could be flushing a bunch, a bunch of small files.
> 

Due to various problems with swap, the swap was turned off and the
overcommitment of the memory was turned on. 


stack-3 wrote:
> 
>> The configuration files:
>>
>> hadoop/core-site.xml http://pastebin.ca/2051527
>> hadoop/hadoop-env.sh http://pastebin.ca/2051528
> 
> Your HADOOP_CLASSPATH is a little odd.  You are doing * on jar
> directories.  Does that work?
> 
> This CLASSPATH  mentions nutch and a bunch of other stuff.  Are you
> running just datanodes on these machines or tasktracers and mapreduce
> too?
> 
> These are old IA stock machines?  Do they have ECC RAM?  (IIRC, they
> used to not have ECC RAM).
> 

Strangely, on the machines and the debian installed, only this (star * )
approach works. Originally, I was running the DB on the same cluster as the
processing took place - mostly mapreduce jobs reading the data and doing
some analysis. But when I started using nutchwax on the same cluster I
started running out of memory (on the mapreduce side) and since the machines
are so sensitive (no swap and overcommitment) that became a nightmare. So
right now the nutch is being ran on a separate cluster - I have tweaked
nutchwax to work with recent Hadoop apis and also to take the hbase stored
content on as the input (instead of ARC files).

The machines are somehow renovated old red boxes (I dont know what
configuration they were originally). The RAM is not an ECC as far as I know,
because the chipset on the motherboards does not support that technology.


stack-3 wrote:
> 
>> hadoop/hdfs-site.xml http://pastebin.ca/2051529
>>
> 
> Did you change the dfs block size?   Looks like its 256M rather than
> usual 64M.  Any reason for that?  Would suggest going w/ defaults at
> first.
> 
> Remove dfs.datanode.socket.write.timeout == 0.  Thats an old config.
> recommendation that should no longer be necessary and is likely
> corrosive.
> 

I have changed the size of the block, to diminish the overall number of
blocks. I was following some advices regarding managing that large amount of
data in HDFS that I found in the fora. 

As for the dfs.datanode.socket.write.timeout, that was set up because I was
observing quite often timeouts on the DFS sockets, and by digging around, I
have found out, that for some reason the internal java times were not
aligned of the connecting machines (even though the hw clock were), I think
there was a JIRA for that. 


stack-3 wrote:
> 
>> hbase/hbase-site.xml http://pastebin.ca/2051532
> 
> You are letting major compactions run every 24 hours.  You might want
> to turn them off and then manage the major compactions to happen
> during downtimes.  They have a knack of cutting in just when you don't
> want them too; e.g. when you are under peak loading.
> 
> You have upped the flush size above default; i.e.
> hbase.hregion.memstore.flush.size.  This will put more pressure on RAM
> when I'd think you would want to have less since you are poor where it
> comes to RAM.
> 
> You have upped your regionsize above default.   That is good I'd say.
> You might want to 4x -- go to 4G -- since you are doing relatively big
> stuff.
> 
> You should send me the metrics that show on the top of the
> regionserver UI when you are under load.  I'd like to see things like
> how much of your RAM is given over to indices for these storefiles.
> 
> I see you hdfs.block.size specified in here at 256M.  So stuff written
> by hbase into hdfs will have a block size of 256M.  Any reason to do
> this?  I'd say leave it at default unless you have a really good
> reason to do otherwise (Remove this config. from this file).
> 
> 
Again, the reason to upper the block size was motivated by the assumption of
lowering the overall number of blocks. If it imposes stress on the RAM it
makes sense to leave it on the defaults. I guess it also helps the
parallelization.



stack-3 wrote:
> 
> 
> 
>> hbase/hbase-env.sh http://pastebin.ca/2051535
>>
> 
> Remove this:
> 
> -XX:+HeapDumpOnOutOfMemoryError
> 
> Means it will dump heap if JVM crashes.  This is probably of no
> interest to you and could actually cause you pain if you have small
> root file system if the heap dump causes you to fill.
> 
> The -XX:CMSInitiatingOccupancyFraction=90 is probably near useless
> (default is 92% or 88% -- I don't remember which).  Set it down to 80%
> or 75% if you want it to actually make a difference.
> 
> Are you having issues w/ GC'ing?  I see you have mslab enabled.
> 
> 
On the version 0.20.6 I have seen long pauses during the importing phase and
also when querying. I was measuring the how many queries were processed per
second and could see pauses in the throughput. The only culprit I could find
was the gc, but still could not figure out why it pauses the whole DB.
Therefore I gave it a shot with mslab with 0.90, but I do still see those
pauses in the throughput.


stack-3 wrote:
> 
> 
>> Because the nproc was high I had inspected the out files of the RSs' and
>> found one which indicated that all the IPCs OOMEd, unfortunately I dont
>> have
>> those because they got overwritten after a cluster restart.
> 
> This may have been because of  HBASE-3813 .  See if 0.90.3 helps
> (There is an improvement here).
> 
> Next time, let us see them.
> 
>> So that means
>> that it was OK on the client side. Funny is that all RS processes were up
>> and running, only that one with OOMEd IPCs did not really communicate
>> (after
>> trying to restart the importing process no inserts went through).
> 
> An OOME'd process goes wonky thereafter and acts in irrational ways.
> Perhaps this was why it stopped taking on requests.
> 
>> So the
>> cluster seemed OK - I was storing statistics that were apparently served
>> by
>> another RS and those were also listed OK. As I mentioned, the log of the
>> bad
>> RS did not mention that anything wrong happened.
>>
>> My observation was: the regions were spread on all RSs but the crashed RS
>> served most of them about a half more than any other, therefore was
>> accessed
>> the more than others. I have discussed the load balancing in HBase 0.90.2
>> with Ted Yu already.
>>
>> The balancer needs to be tuned I guess because when the table is created
>> and
>> loaded from scratch, the regions of the table are not balanced equally
>> (in
>> terms of numbers) in the cluster and I guess the RS that hosted the very
>> first region is serving the majority of servers as they are being split.
>> It
>> imposes larger load on that RS which is more prone to failures (like mine
>> OOME) and kill performance.
>>
> 
> OK.  Yeah, Ted has been arguing that the balancer should be
> table-conscious in that it should try and spread tables across the
> cluster.  Currently its not.  All regions are equal in the balancer's
> eyes.  0.90.2 didn't help?  (Others have since reported that they
> think as Ted that the region a table comes from should be considered
> when balancing).
> 
> 
In fact, what I still see in 0.90.2 is that when I start inserting to empty
table, when the number of the regions in this table rises (more than 100),
they are spread all over the cluster (good) but one RS (the holding the
first region) serves remarkably more regions than the rest of the RSs, which
kills performance of the whole cluster and puts a lot of stress on this one
RS (there was no RS downtime, and the overall region numbers are even on all
RS).


stack-3 wrote:
> 
> 
> 
>> I have resumed the process with rebalancing the regions beforehand and
>> was
>> achieving higher data ingestion rate and also did not ran into the OOME
>> with
>> one RS. Right now I am trying to replay the incident.
>>
>> I know that my scenario would require better machines, but those are what
>> I
>> have now and am before production running stress tests. In comparison
>> with
>> 0.20.6 the 0.90.2 is less stable regarding the insertion but it scales
>> sub-linearily (v0.20.6 did not scale on my data) in terms of random
>> access
>> queries (including multi-versioned data) - have done extensive comparison
>> regarding this.
>>
> 
> 0.90.2 scales sub-linearly?  You mean its not linear but more machines
> helps?
> 
> Are you random reads truly random (I'd guess they are).  Would cache help?
> 
> Have you tried hdfs-237 patch?  There is a version that should work
> for you version of cdh.  It could a big difference (though, beware,
> the latest posted patch does not do checksuming and if your hardware
> does not have ECC, it could be a problem).
> 
> 
I have done some test using random access queries and multiversioned data
(10 to 50 different timestamps per data) and that the random access in
v0.20.6 is degrading linearly with the number of versions, in the case of
0.90, some slow down was recorded but in sub-linear speed. Still while using
the same amount of machines.

The reads were random, I pre-selected the rows from the whole collection.
The cache helped, I could see in the pattern the time it took to serve a
query's answer from disk and from cache.

Are you sure that you have suggested the right patch (hdfs-237)? It mentions
dfsadmin... And no the machines do not have ECC enabled ram.



stack-3 wrote:
> 
> 
> 
> St.Ack
> 
> 

Stan
-- 
View this message in context: http://old.nabble.com/HTable.put-hangs-on-bulk-loading-tp31338874p31612028.html
Sent from the HBase User mailing list archive at Nabble.com.


Mime
View raw message