Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (nike.apache.org: domain of jdcryans@gmail.com designates
 209.85.161.169 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:sender:in-reply-to:references:date
         :x-google-sender-auth:message-id:subject:from:to:content-type
         :content-transfer-encoding;
        b=GyXjw//hnzaIgDwpuALQPohbtw+K3PhPmXipyVAN81IDEyqsDwu7q93x7ti7ecEqYz
         +IZAO3cOsH7XjXeBuI3Mv+HuYl0fvavaQJW1oRiB9JL2YBiroPBqHMaXBlJabMOPqQAW
         +CXkaqJ59xwzPcW1JlVMvpUWC8GPhIU0GFw6Y=
MIME-Version: 1.0
Sender: jdcryans@gmail.com
In-Reply-To: <4C3D9267.2040009@ugame.net.pl>
References: <4C3C6EB1.9060303@ugame.net.pl>
	<AANLkTilHVajvMu1Blia7G6G3_VoflvbvVMx7Db3qRqu_@mail.gmail.com>
	<4C3CC257.8040405@ugame.net.pl>
	<AANLkTilGUttjJXrkBFYz1f0zbdISrOC9fCfWBFFCw_T_@mail.gmail.com>
	<4C3CAC7D.7030208@ugame.net.pl>
	<AANLkTin1MnRICUCyM1TWPvyoQRUaj6O7jh_FV3oQeheu@mail.gmail.com>
	<4C3CDCA1.200@ugame.net.pl>
	<AANLkTim6oKxMS2SR1KelYviAVbQzZfdhw33_B9Y_Pif0@mail.gmail.com>
	<4C3CE227.1030004@ugame.net.pl>
	<AANLkTim4h9hnv0VxpeEW3SC3ybI8aIIqnEONeBzIQ9Z9@mail.gmail.com>
	<4C3D9267.2040009@ugame.net.pl>
Date: Wed, 14 Jul 2010 08:29:59 -0700
Message-ID: <AANLkTilcBXd79HbBinrY6VceaACuhNaL-2NW-I2Mewb1@mail.gmail.com>
Subject: Re: Replication
From: Jean-Daniel Cryans <jdcryans@apache.org>
To: user@hbase.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

This is not related to replication, it's about a new feature added by
https://issues.apache.org/jira/browse/HBASE-2306

https://issues.apache.org/jira/browse/HBASE-2382 added the required
documentation, basically if you are in pseudo-distributed you need to
tell HBase not to expect the default number of replicas:
http://hbase.apache.org/docs/r0.89.20100621/apidocs/overview-summary.html#p=
seudo-distrib

See the discussion in 2382, do you think it could be more user friendly?

J-D

On Wed, Jul 14, 2010 at 3:33 AM, Sebastian Bauer <admin@ugame.net.pl> wrote=
:
> =A0So replication is working, but after hadoop update i see many of this =
on
> slave:
>
> 2010-07-14 12:30:51,941 WARN org.apache.hadoop.hbase.regionserver.wal.HLo=
g:
> HDFS pipeline error detected. Found 1 replicas but expecting 3 replicas.
> =A0Requesting close of hlog.
> 2010-07-14 12:30:51,955 INFO
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using syn=
cFs
> -- HDFS-200
> 2010-07-14 12:30:51,955 INFO org.apache.hadoop.hbase.regionserver.wal.HLo=
g:
> Roll
> /hbase/.logs/db2b.goldenline.pl,60020,1279102137010/85.232.237.235%3A6002=
0.1279103451914,
> entrie
> s=3D1, filesize=3D555. New hlog
> /hbase/.logs/db2b.goldenline.pl,60020,1279102137010/85.232.237.235%3A6002=
0.1279103451944
> 2010-07-14 12:30:51,957 WARN org.apache.hadoop.hbase.regionserver.wal.HLo=
g:
> HDFS pipeline error detected. Found 1 replicas but expecting 3 replicas.
> =A0Requesting close of hlog.
> 2010-07-14 12:30:51,966 INFO
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using syn=
cFs
> -- HDFS-200
> 2010-07-14 12:30:51,967 INFO org.apache.hadoop.hbase.regionserver.wal.HLo=
g:
> Roll
> /hbase/.logs/db2b.goldenline.pl,60020,1279102137010/85.232.237.235%3A6002=
0.1279103451944,
> entrie
> s=3D1, filesize=3D1195. New hlog
> /hbase/.logs/db2b.goldenline.pl,60020,1279102137010/85.232.237.235%3A6002=
0.1279103451959
>
>
> and something like this on master:
>
> 2010-07-14 12:25:10,939 WARN org.apache.hadoop.hbase.regionserver.wal.HLo=
g:
> HDFS pipeline error detected. Found 1 replicas but expecting 3 replicas.
> =A0Requesting close of hlog.
> 2010-07-14 12:25:10,940 WARN org.apache.hadoop.hbase.regionserver.wal.HLo=
g:
> HDFS pipeline error detected. Found 1 replicas but expecting 3 replicas.
> =A0Requesting close of hlog.
> 2010-07-14 12:25:10,940 WARN org.apache.hadoop.hbase.regionserver.wal.HLo=
g:
> HDFS pipeline error detected. Found 1 replicas but expecting 3 replicas.
> =A0Requesting close of hlog.
> 2010-07-14 12:25:11,399 INFO
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using syn=
cFs
> -- HDFS-200
> 2010-07-14 12:25:11,400 INFO org.apache.hadoop.hbase.regionserver.wal.HLo=
g:
> Roll
> /hbase/.logs/db2a.goldenline.pl,60020,1279102568601/85.232.237.234%3A6002=
0.1279103110860,
> entries=3D
> 81, filesize=3D22075. New hlog
> /hbase/.logs/db2a.goldenline.pl,60020,1279102568601/85.232.237.234%3A6002=
0.1279103111379
> 2010-07-14 12:25:11,451 DEBUG
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper:
> <db2a:/hbase,db2a.goldenline.pl,60020,1279102568601>Created
> /hbase/replication/rs/db2a.goldenline
> .pl,60020,1279102568601/test/85.232.237.234%3A60020.1279103111379 with da=
ta
> 2010-07-14 12:25:11,454 WARN org.apache.hadoop.hbase.regionserver.wal.HLo=
g:
> HDFS pipeline error detected. Found 1 replicas but expecting 3 replicas.
> =A0Requesting close of hlog.
>
>
> W dniu 14.07.2010 00:08, Jean-Daniel Cryans pisze:
>>
>> Just looked at the head of 0.20-append and I see it contains the
>> missing patch (was committed as part of HDFS-1057).
>>
>> So that would mean that the file is just empty :) If you insert a few
>> rows in the shell on the master cluster, do you see them some seconds
>> later on the slave?
>>
>> J-D
>>
>> On Tue, Jul 13, 2010 at 3:01 PM, Sebastian Bauer<admin@ugame.net.pl>
>> =A0wrote:
>>>
>>> =A0W dniu 13.07.2010 23:50, Jean-Daniel Cryans pisze:
>>>>
>>>> Yeah using an experimental feature can be "odd" to use :D
>>>
>>> I love bleeding edge technologies :D
>>>>
>>>> So one of the following is happening:
>>>>
>>>> =A01) You aren't using a version of hadoop patched enough to get
>>>> replication working fully. Trunk uses a special jar that I patched
>>>> myself. CDH3b2 also has everything needed. What this means is that
>>>> it's trying to open the log file but the first block isn't available
>>>> (it's actually a very small patch for the Namenode).
>>>
>>> I'm using hadoop from 0.20-append branch released with hbase-0.89.xxxx
>>>
>>>> =A02) The file is empty, because nothing was written to the log file.
>>>> What this means is that it's trying to open the log file but there's
>>>> not even a single block in it, so it fails on EOF.
>>>
>>> this problem can be because of this, cause when all its running i see
>>> less
>>> of this traces
>>>
>>>> J-D
>>>>
>>> Thanks for your help :)
>>>
>>>> On Tue, Jul 13, 2010 at 2:37 PM, Sebastian Bauer<admin@ugame.net.pl>
>>>> =A0wrote:
>>>>>
>>>>> =A0after trying to setup replication i have got many od this errors:
>>>>>
>>>>> 2010-07-13 23:35:26,498 WARN
>>>>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
>>>>> Waited
>>>>> too long for this file, considering dumping
>>>>> 2010-07-13 23:35:26,498 DEBUG
>>>>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
>>>>> Unable
>>>>> to open a reader, sleeping 100 times 10
>>>>> 2010-07-13 23:35:27,111 INFO
>>>>> org.apache.hadoop.hbase.regionserver.Store:
>>>>> Completed compaction of 3 file(s) in c of
>>>>>
>>>>>
>>>>> CampaignToUsers,43-m_2010_5_750D70A83162FF54389D2CA67ADA0B86,12786101=
26054.6504d518fb224efe1530e79c198994cd.;
>>>>> new
>>>>> =A0storefile is
>>>>>
>>>>>
>>>>> hdfs://db2a:50001/hbase/CampaignToUsers/6504d518fb224efe1530e79c19899=
4cd/c/226233377281334567;
>>>>> store size is 19.6m
>>>>> 2010-07-13 23:35:27,111 INFO
>>>>> org.apache.hadoop.hbase.regionserver.HRegion:
>>>>> compaction completed on region
>>>>>
>>>>>
>>>>> CampaignToUsers,43-m_2010_5_750D70A83162FF54389D2CA67ADA0B86,12786101=
26054.6504d518fb224efe1530e79c198994cd.
>>>>> in 1sec
>>>>> 2010-07-13 23:35:27,111 INFO
>>>>> org.apache.hadoop.hbase.regionserver.HRegion:
>>>>> Starting compaction on region
>>>>> UsersToCampaign,,1278609821058.ecb7605434967e247ce14d525849495d.
>>>>> 2010-07-13 23:35:27,112 DEBUG
>>>>> org.apache.hadoop.hbase.regionserver.Store:
>>>>> Compaction size of c: 31.4m; Skipped 0 file(s), size: 0
>>>>> 2010-07-13 23:35:27,112 INFO
>>>>> org.apache.hadoop.hbase.regionserver.Store:
>>>>> Started compaction of 3 file(s) in c of
>>>>> UsersToCampaign,,1278609821058.ecb7605434967e247ce14d525849495d. =A0i=
nto
>>>>> hdfs://db2a:50001/hbase/UsersToCampaign/ecb7
>>>>> 605434967e247ce14d525849495d/.tmp, seqid=3D65302505
>>>>> 2010-07-13 23:35:27,498 INFO
>>>>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
>>>>> Opening
>>>>> log for replication 85.232.237.234%3A60020.1279056880911 at 0
>>>>> 2010-07-13 23:35:27,499 WARN
>>>>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
>>>>> test
>>>>> Got:
>>>>> java.io.EOFException
>>>>> =A0 =A0 =A0 =A0at java.io.DataInputStream.readFully(DataInputStream.j=
ava:180)
>>>>> =A0 =A0 =A0 =A0at java.io.DataInputStream.readFully(DataInputStream.j=
ava:152)
>>>>> =A0 =A0 =A0 =A0at
>>>>> org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1457)
>>>>> =A0 =A0 =A0 =A0at
>>>>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:143=
5)
>>>>> =A0 =A0 =A0 =A0at
>>>>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:142=
4)
>>>>> =A0 =A0 =A0 =A0at
>>>>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:141=
9)
>>>>> =A0 =A0 =A0 =A0at
>>>>>
>>>>>
>>>>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALRea=
der.<init>(SequenceFileLogReader.java:51)
>>>>> =A0 =A0 =A0 =A0at
>>>>>
>>>>>
>>>>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(S=
equenceFileLogReader.java:103)
>>>>> =A0 =A0 =A0 =A0at
>>>>> org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:511=
)
>>>>> =A0 =A0 =A0 =A0at
>>>>>
>>>>>
>>>>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.op=
enReader(ReplicationSource.java:422)
>>>>> =A0 =A0 =A0 =A0at
>>>>>
>>>>>
>>>>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.ru=
n(ReplicationSource.java:262)
>>>>> 2010-07-13 23:35:27,499 WARN
>>>>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
>>>>> Waited
>>>>> too long for this file, considering dumping
>>>>> 2010-07-13 23:35:27,499 DEBUG
>>>>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
>>>>> Unable
>>>>> to open a reader, sleeping 100 times 10
>>>>> 2010-07-13 23:35:28,499 INFO
>>>>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
>>>>> Opening
>>>>> log for replication 85.232.237.234%3A60020.1279056880911 at 0
>>>>> 2010-07-13 23:35:28,500 WARN
>>>>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
>>>>> test
>>>>> Got:
>>>>> java.io.EOFException
>>>>> =A0 =A0 =A0 =A0at java.io.DataInputStream.readFully(DataInputStream.j=
ava:180)
>>>>> =A0 =A0 =A0 =A0at java.io.DataInputStream.readFully(DataInputStream.j=
ava:152)
>>>>> =A0 =A0 =A0 =A0at
>>>>> org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1457)
>>>>> =A0 =A0 =A0 =A0at
>>>>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:143=
5)
>>>>> =A0 =A0 =A0 =A0at
>>>>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:142=
4)
>>>>> =A0 =A0 =A0 =A0at
>>>>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:141=
9)
>>>>> =A0 =A0 =A0 =A0at
>>>>>
>>>>>
>>>>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALRea=
der.<init>(SequenceFileLogReader.java:51)
>>>>> =A0 =A0 =A0 =A0at
>>>>>
>>>>>
>>>>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(S=
equenceFileLogReader.java:103)
>>>>> =A0 =A0 =A0 =A0at
>>>>> org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:511=
)
>>>>> =A0 =A0 =A0 =A0at
>>>>>
>>>>>
>>>>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.op=
enReader(ReplicationSource.java:422)
>>>>> =A0 =A0 =A0 =A0at
>>>>>
>>>>>
>>>>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.ru=
n(ReplicationSource.java:262)
>>>>>
>>>>> W dniu 13.07.2010 20:18, Jean-Daniel Cryans pisze:
>>>>>>
>>>>>> No, but you can use the new mapreduce utility
>>>>>> org.apache.hadoop.hbase.mapreduce.CopyTable to copy whole tables
>>>>>> between clusters. It's like distcp for HBase.
>>>>>>
>>>>>> Oh and looking at the documentation I just figured that I changed th=
e
>>>>>> name of the configuration that enables replication just before
>>>>>> committing and forgot to update the package.html file, it's now simp=
ly
>>>>>> hbase.replication (and it should stay like that). I'll fix that in t=
he
>>>>>> scope of HBASE-2808.
>>>>>>
>>>>>> J-D
>>>>>>
>>>>>> On Tue, Jul 13, 2010 at 11:12 AM, Sebastian Bauer<admin@ugame.net.pl=
>
>>>>>> =A0wrote:
>>>>>>>
>>>>>>> =A0I have one more question can i first create master and after loa=
ding
>>>>>>> data
>>>>>>> connect slave or turn on replication on existing tables with data?
>>>>>>>
>>>>>>> W dniu 13.07.2010 19:56, Jean-Daniel Cryans pisze:
>>>>>>>>>
>>>>>>>>> Thanks for info where i can find some documentation. There is inf=
o
>>>>>>>>> about
>>>>>>>>> zookeeper that it need running in standalone mode it is true?
>>>>>>>>>
>>>>>>>> Well you can run add_peer.rb when the clusters are running, but th=
ey
>>>>>>>> won't pickup the change live (that part isn't done yet). So if you
>>>>>>>> run
>>>>>>>> the script while the cluster is running, restart it. Also take a
>>>>>>>> look
>>>>>>>> at the region server log, it should output something like this whe=
n
>>>>>>>> starting up:
>>>>>>>>
>>>>>>>> =A0 =A0 LOG.info("This cluster (" + thisCluster + ") is a "
>>>>>>>> =A0 =A0 =A0 =A0 =A0 + (this.replicationMaster ? "master" : "slave"=
) + " for
>>>>>>>> replication" +
>>>>>>>> =A0 =A0 =A0 =A0 =A0 ", compared with (" + address + ")");
>>>>>>>>
>>>>>>>> This will tell you if you used the right address for zookeeper. If
>>>>>>>> your region server on the master cluster thinks its a slave, then
>>>>>>>> the
>>>>>>>> addresses are wrong. Also currently there's no reporting for
>>>>>>>> replication, since it's not done yet!
>>>>>>>>
>>>>>>>> For a more in-depth documentation, check out
>>>>>>>> https://issues.apache.org/jira/browse/HBASE-2808
>>>>>>>>
>>>>>>>> Thanks for trying this out, as the author of most of that part of
>>>>>>>> the
>>>>>>>> code I'm thrilled!
>>>>>>>>
>>>>>>>> J-D
>>>>>>>>
>>>
>
>