hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Data disappears and re-appears again after HBase cluster restart
Date Fri, 23 Jul 2010 00:43:49 GMT
Data doesn't disappear, it's probably just hidden behind a delete or
something like that (the user mailing list contains reports of events
like that that were fixed by running NTP on all machines, as required
by the Getting Started guide
http://hbase.apache.org/docs/r0.20.5/api/overview-summary.html#requirements).

This article explains gives good info about timestamps in HBase
http://outerthought.org/blog/417-ot.html

J-D

On Thu, Jul 22, 2010 at 5:29 PM, Vladimir Rodionov
<vrodionov@carrieriq.com> wrote:
> Yes, I just checked all 3 servers and their clocks are not synchronized (up to 2 min
diff)
> Can you please elaborate a little bit more:  how can this result in data disappearance?
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
>
> ________________________________________
> From: jdcryans@gmail.com [jdcryans@gmail.com] On Behalf Of Jean-Daniel Cryans [jdcryans@apache.org]
> Sent: Thursday, July 22, 2010 4:38 PM
> To: dev@hbase.apache.org
> Subject: Re: Data disappears and re-appears again after HBase cluster restart
>
> I would guess clock skew, all the machines have approx the same time?
> A few seconds is acceptable, but not more.
>
> J-D
>
> On Thu, Jul 22, 2010 at 4:34 PM, Vladimir Rodionov
> <vrodionov@carrieriq.com> wrote:
>> Have anybody encountered this particular bug before?
>> We have been having this intermittently in our QA small cluster.
>>
>> We run a flow  which is basically custom ETL process over data stored in hdfs. Yes
it is a bunch of M/R jobs.
>> One of the jobs stores data into HBase (0.20.3), the next one loads data from HBase
(using scan) performs additional transformations
>> and stores data finally into RDBMS.
>>
>> Flow works fine (most of the time). It means that new HBase tables are created, data
is loaded and can be read after that during the next M/R job
>>
>> After flow finishes , data from tables (but not tables itself), sometimes, mysteriously
disappear. This is not deterministic and to get data back we need to RESTART HBase cluster.
>> So HBase restart fixes the problem.
>>
>> Cluster is small (3 servers). RAM is limited - 8GB. Only 2 CPU cores per server but
input data size is small as well and the average size of disappearing tables is several 1000s
rows-
>> they are small. Hadoop is from CHD2. I can not get you any additional helpful information
at the time (no log files), but may be somebody has encountered this
>> before and has idea how to fix it.
>>
>>
>> Best regards,
>> Vladimir Rodionov
>> Principal Platform Engineer
>> Carrier IQ, www.carrieriq.com
>> e-mail: vrodionov@carrieriq.com
>>
>

Mime
View raw message