hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From elsif <elsif.t...@gmail.com>
Subject Re: HBase Exceptions on version 0.20.1
Date Wed, 21 Oct 2009 21:36:12 GMT
stack wrote:
> On Wed, Oct 21, 2009 at 8:16 AM, elsif <elsif.then@gmail.com> wrote:
>
>   
>> There are 239 "Block blk_-xxx is not valid errors", 522 "BlockInfo not
>> found in volumeMap" errors, and 208 "BlockAlreadyExistsException" found
>> in the hadoop logs over 12 hours of running the test.
>>
>>     
>
> Above are from application-level (hbase) or datanode logs?  If you trace any
> of the above -- follow the block name -- in the NN are the blocks lost or do
> you see replicas taking over or recoveries triggered?
>
>
>   
The above error counts are from the datanode logs.  It looks like blocks
are being requested after they are deleted (HBASE-1894):

hdfs:2009-10-21 11:30:55,580 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(10.1.29.12:50010,
storageID=DS-1209147803-10.1.29.12-50010-1651186044136, infoPort=50075,
ipcPort=50020):Transmitted block blk_-6090437519946022724_469173 to
/10.1.29.20:50010

hdfs:2009-10-21 11:30:59,938 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Deleting block
blk_-6090437519946022724_469173 file
/mnt/8d8aa987-017c-4f4c-822c-fe53f4163cee/current/subdir18/blk_-6090437519946022724

hdfs:2009-10-21 11:59:09,509 WARN
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(10.1.29.12:50010,
storageID=DS-1209147803-10.1.29.12-50010-1651186044136, infoPort=50075,
ipcPort=50020):Got exception while serving
blk_-6090437519946022724_469173 to /10.1.29.14:
hdfs:java.io.IOException: Block blk_-6090437519946022724_469173 is not
valid.

>> I understand that I am loading the cluster - that is the point of the
>> test, but I don't think that this should result in data loss.  Failed
>> inserts at the client level I can handle, but loss of data that was
>> previously thought to be stored in hbase is a major issue.  Are there
>> plans to make hbase more resilient to load based failures?
>>
>>
>>     
> It looks like there'll be data loss going by a few of the exceptions you
> provide originally.  Here's a couple of comments:
>
> "No live nodes contain current block"  Usually we see this if the
> client-side hadoop has not been patched with hdfs-127/hadoop-4681.  Your
> test program doesn't seem to have come across.  Mind attaching it to an
> issue so I can try it?  Going by the way you started your test program, you
> should have the hbase patched hadoop first in your CLASSPATH so you should
> be ok but maybe there is something about your environmnent frustrating
> hbase's using a patched hadoop?
>
>   
The test program is short and is in-line at the end of this message.  In
this last run, the client was running against the hadoop-0.20.1-core.jar
instead of the hadoop-0.20.1-hdfs127-core.jar.  I will retest using the
hdfs-127 jar.
> "java.io.IOException: TIMED OUT" Your regionserver or master timed out its
> zk session.  GC or swapping or disk used by zk is under heavy i/o loading?
>   
There are five zk servers in the test setup that share hardware with
nodes that are also datanodes and regionservers.  There are only 8
instances of this error and all of them occurred within 30 seconds of
each other on the same node.
> "ClosedChannelException" Probably symptom of a RS shutdown because of events
> such as above.
>
> "Abandoning block..." Did this write to HLog fail? Its just an INFO level
> log out of DFSClient.
>   
Since these are just INFO level messages, they are not an issue?

hbase:2009-10-21 11:40:57,082 INFO org.apache.hadoop.hdfs.DFSClient:
Abandoning block blk_-1365065510928989381_470690
hdfs:2009-10-21 11:40:57,080 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_-1365065510928989381_470690 src: /10.1.29.23:56117 dest:
/10.1.29.23:50010
hdfs:2009-10-21 11:40:57,081 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
blk_-1365065510928989381_470690 received exception
org.apache.hadoop.util.Shell$ExitCodeException: du: cannot access
`/mnt/eac8b34a-a4a4-495b-8241-43cd4b21c28f/current/blk_-8854288022995097022_462986.meta':
No such file or directory

hbase:2009-10-21 11:36:12,609 INFO org.apache.hadoop.hdfs.DFSClient:
Abandoning block blk_4473876596208463127_470012
hdfs:2009-10-21 11:35:44,842 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_4473876596208463127_470012 src: /10.1.29.22:57916 dest:
/10.1.29.14:50010
hdfs:2009-10-21 11:35:44,844 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
blk_4473876596208463127_470012 received exception java.io.EOFException
hdfs:2009-10-21 11:36:26,105 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_4473876596208463127_470012 src: /10.1.29.14:34537 dest:
/10.1.29.21:50010
hdfs:2009-10-21 11:36:26,105 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
blk_4473876596208463127_470012 received exception
org.apache.hadoop.util.Shell$ExitCodeException: du: cannot access
`/mnt/e9726517-c84c-4326-8859-44f8261a14fa/current/subdir26/subdir62/blk_4151284369367444478':
No such file or directory
hdfs:2009-10-21 11:36:12,605 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_4473876596208463127_470012 src: /10.1.29.22:34103 dest:
/10.1.29.22:50010
hdfs:2009-10-21 11:36:12,609 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
blk_4473876596208463127_470012 2 Exception java.io.EOFException
hdfs:2009-10-21 11:36:12,609 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 2 for
block blk_4473876596208463127_470012 terminating

hbase:2009-10-21 06:47:10,189 INFO org.apache.hadoop.hdfs.DFSClient:
Abandoning block blk_-6810931018745175038_422166
hdfs:2009-10-21 06:47:10,171 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_-6810931018745175038_422166 src: /10.1.29.12:44933 dest:
/10.1.29.12:50010
hdfs:2009-10-21 06:47:10,188 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
blk_-6810931018745175038_422166 received exception java.io.EOFException
hdfs:2009-10-21 06:46:43,362 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_-6810931018745175038_422166 src: /10.1.29.12:48908 dest:
/10.1.29.14:50010
hdfs:2009-10-21 06:46:43,362 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
blk_-6810931018745175038_422166 received exception
org.apache.hadoop.util.Shell$ExitCodeException: du: cannot access
`/mnt/1a3e57e4-b514-4a85-83b1-ae9d12e3fb7a/current/subdir27/blk_-3769845572103484957_410554.meta':
No such file or directory

> "file system not available" What happened before this?  Was this just an
> emission on regionserver shutdown?
>   
I don't have any "file system not available" messages this time around.
> St.Ack
>
>   
Here is the source for the test program.  For my tests I run two
instances with the default settings:

--

package org.apache.hadoop.hbase.util;

import java.io.IOException;
import java.io.FileInputStream;
import java.lang.Integer;

import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;

/* create 'fc_test', {NAME => 'json', VERSIONS => 1}, {NAME => 'data',
VERSIONS =>1} */

public class LoadTest
    extends Thread
{
    protected int datasize = 4000000;

    public TableTest(int datasize)
    {
        this.datasize = datasize;
    }

    public void run()
    {
        try {
            FileInputStream    random = new FileInputStream("/dev/urandom");
            byte[]    key = new byte[32];
            byte[]  data = new byte[datasize];

            HBaseConfiguration hbaseConfiguration = new
HBaseConfiguration();
            HTable table = new HTable(hbaseConfiguration, "fc_test");
            random.read(data, 0, datasize);

            while (true)
            {
                random.read(key, 0, 32);
                Put update = new Put(Bytes.toBytes(getHexString(key)));
                update.add(Bytes.toBytes("data"), Bytes.toBytes(""),
                    data);
                table.put(update);
            }
        } catch (Exception e) {
            System.out.println(e.toString());
        }

    }

    public static String getHexString(byte[] b) throws Exception
    {
          String result = "";
          for (int i=0; i < b.length; i++)
          {
            result +=
                Integer.toString( ( b[i] & 0xff ) + 0x100,
16).substring( 1 );
          }
         return result;
    }

    public static void main(String args[]) throws IOException
    {
        Thread t = null;
        int i;

        int threads = 30;
        int datasize = 400000;

        try {
            threads = Integer.getInteger(args[0]).intValue();
            datasize = Integer.getInteger(args[1]).intValue();
        } catch (Exception e) {
        }

        System.out.println("Starting " + threads + " threads");

        for (i = 0; i< threads; i++)
        {
               t = new TableTest(datasize);
               t.start();
        }
        // wait for last thread
        try {
           t.join();
        } catch (InterruptedException e) {
        }
    }
}


Mime
View raw message