Return-Path: Delivered-To: apmail-hbase-user-archive@www.apache.org Received: (qmail 69727 invoked from network); 14 Jul 2010 15:30:30 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 14 Jul 2010 15:30:30 -0000 Received: (qmail 40119 invoked by uid 500); 14 Jul 2010 15:30:29 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 40070 invoked by uid 500); 14 Jul 2010 15:30:28 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 40062 invoked by uid 99); 14 Jul 2010 15:30:28 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 14 Jul 2010 15:30:28 +0000 X-ASF-Spam-Status: No, hits=3.6 required=10.0 tests=FREEMAIL_FROM,FS_REPLICA,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jdcryans@gmail.com designates 209.85.161.169 as permitted sender) Received: from [209.85.161.169] (HELO mail-gx0-f169.google.com) (209.85.161.169) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 14 Jul 2010 15:30:21 +0000 Received: by gxk4 with SMTP id 4so4763373gxk.14 for ; Wed, 14 Jul 2010 08:30:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:content-type:content-transfer-encoding; bh=ZoZU1ZOapgKpbraQoqSWTXAbTALsXjResh/QvrYw0Vc=; b=asKN//VlAVJfbzZ6H42KPqJr2wadAOKFO4WusUm7mQwd/dtlTN1nomj2pqO/XUYLyt lWr3tdUMP9y1qV8sPcEJWMJcDqYWbscS5wGK4TO30slcvyMJACfVIE3f7828TmMGOECG J3Rpfd9os7dhsfVdAsH7l+zpjPhZdH8sRa+QM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; b=GyXjw//hnzaIgDwpuALQPohbtw+K3PhPmXipyVAN81IDEyqsDwu7q93x7ti7ecEqYz +IZAO3cOsH7XjXeBuI3Mv+HuYl0fvavaQJW1oRiB9JL2YBiroPBqHMaXBlJabMOPqQAW +CXkaqJ59xwzPcW1JlVMvpUWC8GPhIU0GFw6Y= MIME-Version: 1.0 Received: by 10.224.28.211 with SMTP id n19mr9863913qac.258.1279121400083; Wed, 14 Jul 2010 08:30:00 -0700 (PDT) Sender: jdcryans@gmail.com Received: by 10.229.237.68 with HTTP; Wed, 14 Jul 2010 08:29:59 -0700 (PDT) In-Reply-To: <4C3D9267.2040009@ugame.net.pl> References: <4C3C6EB1.9060303@ugame.net.pl> <4C3CC257.8040405@ugame.net.pl> <4C3CAC7D.7030208@ugame.net.pl> <4C3CDCA1.200@ugame.net.pl> <4C3CE227.1030004@ugame.net.pl> <4C3D9267.2040009@ugame.net.pl> Date: Wed, 14 Jul 2010 08:29:59 -0700 X-Google-Sender-Auth: 5lMYc3XIMdrL1f9-agRJTdX90Yc Message-ID: Subject: Re: Replication From: Jean-Daniel Cryans To: user@hbase.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org This is not related to replication, it's about a new feature added by https://issues.apache.org/jira/browse/HBASE-2306 https://issues.apache.org/jira/browse/HBASE-2382 added the required documentation, basically if you are in pseudo-distributed you need to tell HBase not to expect the default number of replicas: http://hbase.apache.org/docs/r0.89.20100621/apidocs/overview-summary.html#p= seudo-distrib See the discussion in 2382, do you think it could be more user friendly? J-D On Wed, Jul 14, 2010 at 3:33 AM, Sebastian Bauer wrote= : > =A0So replication is working, but after hadoop update i see many of this = on > slave: > > 2010-07-14 12:30:51,941 WARN org.apache.hadoop.hbase.regionserver.wal.HLo= g: > HDFS pipeline error detected. Found 1 replicas but expecting 3 replicas. > =A0Requesting close of hlog. > 2010-07-14 12:30:51,955 INFO > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using syn= cFs > -- HDFS-200 > 2010-07-14 12:30:51,955 INFO org.apache.hadoop.hbase.regionserver.wal.HLo= g: > Roll > /hbase/.logs/db2b.goldenline.pl,60020,1279102137010/85.232.237.235%3A6002= 0.1279103451914, > entrie > s=3D1, filesize=3D555. New hlog > /hbase/.logs/db2b.goldenline.pl,60020,1279102137010/85.232.237.235%3A6002= 0.1279103451944 > 2010-07-14 12:30:51,957 WARN org.apache.hadoop.hbase.regionserver.wal.HLo= g: > HDFS pipeline error detected. Found 1 replicas but expecting 3 replicas. > =A0Requesting close of hlog. > 2010-07-14 12:30:51,966 INFO > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using syn= cFs > -- HDFS-200 > 2010-07-14 12:30:51,967 INFO org.apache.hadoop.hbase.regionserver.wal.HLo= g: > Roll > /hbase/.logs/db2b.goldenline.pl,60020,1279102137010/85.232.237.235%3A6002= 0.1279103451944, > entrie > s=3D1, filesize=3D1195. New hlog > /hbase/.logs/db2b.goldenline.pl,60020,1279102137010/85.232.237.235%3A6002= 0.1279103451959 > > > and something like this on master: > > 2010-07-14 12:25:10,939 WARN org.apache.hadoop.hbase.regionserver.wal.HLo= g: > HDFS pipeline error detected. Found 1 replicas but expecting 3 replicas. > =A0Requesting close of hlog. > 2010-07-14 12:25:10,940 WARN org.apache.hadoop.hbase.regionserver.wal.HLo= g: > HDFS pipeline error detected. Found 1 replicas but expecting 3 replicas. > =A0Requesting close of hlog. > 2010-07-14 12:25:10,940 WARN org.apache.hadoop.hbase.regionserver.wal.HLo= g: > HDFS pipeline error detected. Found 1 replicas but expecting 3 replicas. > =A0Requesting close of hlog. > 2010-07-14 12:25:11,399 INFO > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using syn= cFs > -- HDFS-200 > 2010-07-14 12:25:11,400 INFO org.apache.hadoop.hbase.regionserver.wal.HLo= g: > Roll > /hbase/.logs/db2a.goldenline.pl,60020,1279102568601/85.232.237.234%3A6002= 0.1279103110860, > entries=3D > 81, filesize=3D22075. New hlog > /hbase/.logs/db2a.goldenline.pl,60020,1279102568601/85.232.237.234%3A6002= 0.1279103111379 > 2010-07-14 12:25:11,451 DEBUG > org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: > Created > /hbase/replication/rs/db2a.goldenline > .pl,60020,1279102568601/test/85.232.237.234%3A60020.1279103111379 with da= ta > 2010-07-14 12:25:11,454 WARN org.apache.hadoop.hbase.regionserver.wal.HLo= g: > HDFS pipeline error detected. Found 1 replicas but expecting 3 replicas. > =A0Requesting close of hlog. > > > W dniu 14.07.2010 00:08, Jean-Daniel Cryans pisze: >> >> Just looked at the head of 0.20-append and I see it contains the >> missing patch (was committed as part of HDFS-1057). >> >> So that would mean that the file is just empty :) If you insert a few >> rows in the shell on the master cluster, do you see them some seconds >> later on the slave? >> >> J-D >> >> On Tue, Jul 13, 2010 at 3:01 PM, Sebastian Bauer >> =A0wrote: >>> >>> =A0W dniu 13.07.2010 23:50, Jean-Daniel Cryans pisze: >>>> >>>> Yeah using an experimental feature can be "odd" to use :D >>> >>> I love bleeding edge technologies :D >>>> >>>> So one of the following is happening: >>>> >>>> =A01) You aren't using a version of hadoop patched enough to get >>>> replication working fully. Trunk uses a special jar that I patched >>>> myself. CDH3b2 also has everything needed. What this means is that >>>> it's trying to open the log file but the first block isn't available >>>> (it's actually a very small patch for the Namenode). >>> >>> I'm using hadoop from 0.20-append branch released with hbase-0.89.xxxx >>> >>>> =A02) The file is empty, because nothing was written to the log file. >>>> What this means is that it's trying to open the log file but there's >>>> not even a single block in it, so it fails on EOF. >>> >>> this problem can be because of this, cause when all its running i see >>> less >>> of this traces >>> >>>> J-D >>>> >>> Thanks for your help :) >>> >>>> On Tue, Jul 13, 2010 at 2:37 PM, Sebastian Bauer >>>> =A0wrote: >>>>> >>>>> =A0after trying to setup replication i have got many od this errors: >>>>> >>>>> 2010-07-13 23:35:26,498 WARN >>>>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: >>>>> Waited >>>>> too long for this file, considering dumping >>>>> 2010-07-13 23:35:26,498 DEBUG >>>>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: >>>>> Unable >>>>> to open a reader, sleeping 100 times 10 >>>>> 2010-07-13 23:35:27,111 INFO >>>>> org.apache.hadoop.hbase.regionserver.Store: >>>>> Completed compaction of 3 file(s) in c of >>>>> >>>>> >>>>> CampaignToUsers,43-m_2010_5_750D70A83162FF54389D2CA67ADA0B86,12786101= 26054.6504d518fb224efe1530e79c198994cd.; >>>>> new >>>>> =A0storefile is >>>>> >>>>> >>>>> hdfs://db2a:50001/hbase/CampaignToUsers/6504d518fb224efe1530e79c19899= 4cd/c/226233377281334567; >>>>> store size is 19.6m >>>>> 2010-07-13 23:35:27,111 INFO >>>>> org.apache.hadoop.hbase.regionserver.HRegion: >>>>> compaction completed on region >>>>> >>>>> >>>>> CampaignToUsers,43-m_2010_5_750D70A83162FF54389D2CA67ADA0B86,12786101= 26054.6504d518fb224efe1530e79c198994cd. >>>>> in 1sec >>>>> 2010-07-13 23:35:27,111 INFO >>>>> org.apache.hadoop.hbase.regionserver.HRegion: >>>>> Starting compaction on region >>>>> UsersToCampaign,,1278609821058.ecb7605434967e247ce14d525849495d. >>>>> 2010-07-13 23:35:27,112 DEBUG >>>>> org.apache.hadoop.hbase.regionserver.Store: >>>>> Compaction size of c: 31.4m; Skipped 0 file(s), size: 0 >>>>> 2010-07-13 23:35:27,112 INFO >>>>> org.apache.hadoop.hbase.regionserver.Store: >>>>> Started compaction of 3 file(s) in c of >>>>> UsersToCampaign,,1278609821058.ecb7605434967e247ce14d525849495d. =A0i= nto >>>>> hdfs://db2a:50001/hbase/UsersToCampaign/ecb7 >>>>> 605434967e247ce14d525849495d/.tmp, seqid=3D65302505 >>>>> 2010-07-13 23:35:27,498 INFO >>>>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: >>>>> Opening >>>>> log for replication 85.232.237.234%3A60020.1279056880911 at 0 >>>>> 2010-07-13 23:35:27,499 WARN >>>>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: >>>>> test >>>>> Got: >>>>> java.io.EOFException >>>>> =A0 =A0 =A0 =A0at java.io.DataInputStream.readFully(DataInputStream.j= ava:180) >>>>> =A0 =A0 =A0 =A0at java.io.DataInputStream.readFully(DataInputStream.j= ava:152) >>>>> =A0 =A0 =A0 =A0at >>>>> org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1457) >>>>> =A0 =A0 =A0 =A0at >>>>> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:143= 5) >>>>> =A0 =A0 =A0 =A0at >>>>> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:142= 4) >>>>> =A0 =A0 =A0 =A0at >>>>> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:141= 9) >>>>> =A0 =A0 =A0 =A0at >>>>> >>>>> >>>>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALRea= der.(SequenceFileLogReader.java:51) >>>>> =A0 =A0 =A0 =A0at >>>>> >>>>> >>>>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(S= equenceFileLogReader.java:103) >>>>> =A0 =A0 =A0 =A0at >>>>> org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:511= ) >>>>> =A0 =A0 =A0 =A0at >>>>> >>>>> >>>>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.op= enReader(ReplicationSource.java:422) >>>>> =A0 =A0 =A0 =A0at >>>>> >>>>> >>>>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.ru= n(ReplicationSource.java:262) >>>>> 2010-07-13 23:35:27,499 WARN >>>>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: >>>>> Waited >>>>> too long for this file, considering dumping >>>>> 2010-07-13 23:35:27,499 DEBUG >>>>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: >>>>> Unable >>>>> to open a reader, sleeping 100 times 10 >>>>> 2010-07-13 23:35:28,499 INFO >>>>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: >>>>> Opening >>>>> log for replication 85.232.237.234%3A60020.1279056880911 at 0 >>>>> 2010-07-13 23:35:28,500 WARN >>>>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: >>>>> test >>>>> Got: >>>>> java.io.EOFException >>>>> =A0 =A0 =A0 =A0at java.io.DataInputStream.readFully(DataInputStream.j= ava:180) >>>>> =A0 =A0 =A0 =A0at java.io.DataInputStream.readFully(DataInputStream.j= ava:152) >>>>> =A0 =A0 =A0 =A0at >>>>> org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1457) >>>>> =A0 =A0 =A0 =A0at >>>>> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:143= 5) >>>>> =A0 =A0 =A0 =A0at >>>>> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:142= 4) >>>>> =A0 =A0 =A0 =A0at >>>>> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:141= 9) >>>>> =A0 =A0 =A0 =A0at >>>>> >>>>> >>>>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALRea= der.(SequenceFileLogReader.java:51) >>>>> =A0 =A0 =A0 =A0at >>>>> >>>>> >>>>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(S= equenceFileLogReader.java:103) >>>>> =A0 =A0 =A0 =A0at >>>>> org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:511= ) >>>>> =A0 =A0 =A0 =A0at >>>>> >>>>> >>>>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.op= enReader(ReplicationSource.java:422) >>>>> =A0 =A0 =A0 =A0at >>>>> >>>>> >>>>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.ru= n(ReplicationSource.java:262) >>>>> >>>>> W dniu 13.07.2010 20:18, Jean-Daniel Cryans pisze: >>>>>> >>>>>> No, but you can use the new mapreduce utility >>>>>> org.apache.hadoop.hbase.mapreduce.CopyTable to copy whole tables >>>>>> between clusters. It's like distcp for HBase. >>>>>> >>>>>> Oh and looking at the documentation I just figured that I changed th= e >>>>>> name of the configuration that enables replication just before >>>>>> committing and forgot to update the package.html file, it's now simp= ly >>>>>> hbase.replication (and it should stay like that). I'll fix that in t= he >>>>>> scope of HBASE-2808. >>>>>> >>>>>> J-D >>>>>> >>>>>> On Tue, Jul 13, 2010 at 11:12 AM, Sebastian Bauer >>>>>> =A0wrote: >>>>>>> >>>>>>> =A0I have one more question can i first create master and after loa= ding >>>>>>> data >>>>>>> connect slave or turn on replication on existing tables with data? >>>>>>> >>>>>>> W dniu 13.07.2010 19:56, Jean-Daniel Cryans pisze: >>>>>>>>> >>>>>>>>> Thanks for info where i can find some documentation. There is inf= o >>>>>>>>> about >>>>>>>>> zookeeper that it need running in standalone mode it is true? >>>>>>>>> >>>>>>>> Well you can run add_peer.rb when the clusters are running, but th= ey >>>>>>>> won't pickup the change live (that part isn't done yet). So if you >>>>>>>> run >>>>>>>> the script while the cluster is running, restart it. Also take a >>>>>>>> look >>>>>>>> at the region server log, it should output something like this whe= n >>>>>>>> starting up: >>>>>>>> >>>>>>>> =A0 =A0 LOG.info("This cluster (" + thisCluster + ") is a " >>>>>>>> =A0 =A0 =A0 =A0 =A0 + (this.replicationMaster ? "master" : "slave"= ) + " for >>>>>>>> replication" + >>>>>>>> =A0 =A0 =A0 =A0 =A0 ", compared with (" + address + ")"); >>>>>>>> >>>>>>>> This will tell you if you used the right address for zookeeper. If >>>>>>>> your region server on the master cluster thinks its a slave, then >>>>>>>> the >>>>>>>> addresses are wrong. Also currently there's no reporting for >>>>>>>> replication, since it's not done yet! >>>>>>>> >>>>>>>> For a more in-depth documentation, check out >>>>>>>> https://issues.apache.org/jira/browse/HBASE-2808 >>>>>>>> >>>>>>>> Thanks for trying this out, as the author of most of that part of >>>>>>>> the >>>>>>>> code I'm thrilled! >>>>>>>> >>>>>>>> J-D >>>>>>>> >>> > >