hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Data disappears and re-appears again after HBase cluster restart
Date Tue, 27 Jul 2010 18:17:20 GMT
I logged https://issues.apache.org/jira/browse/HBASE-2882

On Tue, Jul 27, 2010 at 10:41 AM, Stack <stack@duboce.net> wrote:

> On Tue, Jul 27, 2010 at 10:26 AM, Vladimir Rodionov
> <vrodionov@carrieriq.com> wrote:
> > Yes, we set timestamps on all Puts. The vast majority of timestamps are
> in the past (several minutes from now()) and only small fraction is in the
> future (and this future will never come - Its pretty close to
> Long.MAX_VALUE)
>
> When you run a scan, do you set the starttime to include these Puts
> that are in the future?
>
> > But we have now clocks synced on all servers so I do not think this can
> explain the issue. Besides this, we do not set timestamps when we do inserts
> into  one particular table and this table disappears as well (and reappears
> after restart)
> >
>
> This I cannot explain.  I don't see this phenomeon at all.  Restart
> should have no effect on the data being carried by the cluster.  Can
> you dig around some more and get us some more data points?
>
> St.Ack
>
> > Best regards,
> > Vladimir Rodionov
> > Principal Platform Engineer
> > Carrier IQ, www.carrieriq.com
> > e-mail: vrodionov@carrieriq.com
> >
> > ________________________________________
> > From: saint.ack@gmail.com [saint.ack@gmail.com] On Behalf Of Stack [
> stack@duboce.net]
> > Sent: Monday, July 26, 2010 11:05 PM
> > To: dev@hbase.apache.org
> > Subject: Re: Data disappears and re-appears again after HBase cluster
> restart
> >
> > Vladimir:
> >
> > Are you setting times on cells you add to HBase?  If so, could these
> > be in the future as far as the regionserver is concerned.  For
> > example, perhaps you are setting the version/timestamp on a client
> > whose close is different from that over on the RegionServer, then when
> > we scan, we miss these future values?
> >
> > Do you have to restart the cluster?  What happens if you just wait?
> > Does the data come back then?
> >
> > St.Ack
> >
> >
> > On Mon, Jul 26, 2010 at 6:14 PM, Vladimir Rodionov
> > <vrodionov@carrieriq.com> wrote:
> >> We are running ntpd on all servers and clocks are in sync now but it has
> not fixed the problem.
> >> I run the flow, then check
> >>
> >> hbase shell
> >>> count 'tableX'
> >> 0 rows
> >>
> >> after HBase restart I am able to get the 'right' number of rows in a
> table
> >>
> >> For some tables I get wrong number of rows that is always less than the
> actual number of rows, for others I get - 0 rows.
> >> It always goes away after HBase restart. All tables are small in size
> and all are newly created during our flow execution.
> >>
> >> I have checked many times Master and Region server's log files but apart
> from:
> >>
> >> RegionNotServingException -META- (or -ROOT-) I can see nothing
> suspicious.
> >>
> >> In Region servers log files I see a lot of messages like this one:
> >> 2010-07-26 17:05:43,751 INFO
> org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of
> ~114.4k for region
> 10__HB_NOINC_ORCL_JDBC_0726_MEJOMEJO-ERROR_COUNTS-1280187791424-0,,1280187802112
> in 985ms, sequence id=309833, compaction requested=false
> >>
> >> This is during the cluster's shutdown operation.
> >>
> >> Best regards,
> >> Vladimir Rodionov
> >> Principal Platform Engineer
> >> Carrier IQ, www.carrieriq.com
> >> e-mail: vrodionov@carrieriq.com
> >>
> >> ________________________________________
> >> From: jdcryans@gmail.com [jdcryans@gmail.com] On Behalf Of Jean-Daniel
> Cryans [jdcryans@apache.org]
> >> Sent: Thursday, July 22, 2010 5:43 PM
> >> To: dev@hbase.apache.org
> >> Subject: Re: Data disappears and re-appears again after HBase cluster
> restart
> >>
> >> Data doesn't disappear, it's probably just hidden behind a delete or
> >> something like that (the user mailing list contains reports of events
> >> like that that were fixed by running NTP on all machines, as required
> >> by the Getting Started guide
> >>
> http://hbase.apache.org/docs/r0.20.5/api/overview-summary.html#requirements
> ).
> >>
> >> This article explains gives good info about timestamps in HBase
> >> http://outerthought.org/blog/417-ot.html
> >>
> >> J-D
> >>
> >> On Thu, Jul 22, 2010 at 5:29 PM, Vladimir Rodionov
> >> <vrodionov@carrieriq.com> wrote:
> >>> Yes, I just checked all 3 servers and their clocks are not synchronized
> (up to 2 min diff)
> >>> Can you please elaborate a little bit more:  how can this result in
> data disappearance?
> >>>
> >>> Best regards,
> >>> Vladimir Rodionov
> >>> Principal Platform Engineer
> >>> Carrier IQ, www.carrieriq.com
> >>> e-mail: vrodionov@carrieriq.com
> >>>
> >>> ________________________________________
> >>> From: jdcryans@gmail.com [jdcryans@gmail.com] On Behalf Of Jean-Daniel
> Cryans [jdcryans@apache.org]
> >>> Sent: Thursday, July 22, 2010 4:38 PM
> >>> To: dev@hbase.apache.org
> >>> Subject: Re: Data disappears and re-appears again after HBase cluster
> restart
> >>>
> >>> I would guess clock skew, all the machines have approx the same time?
> >>> A few seconds is acceptable, but not more.
> >>>
> >>> J-D
> >>>
> >>> On Thu, Jul 22, 2010 at 4:34 PM, Vladimir Rodionov
> >>> <vrodionov@carrieriq.com> wrote:
> >>>> Have anybody encountered this particular bug before?
> >>>> We have been having this intermittently in our QA small cluster.
> >>>>
> >>>> We run a flow  which is basically custom ETL process over data stored
> in hdfs. Yes it is a bunch of M/R jobs.
> >>>> One of the jobs stores data into HBase (0.20.3), the next one loads
> data from HBase (using scan) performs additional transformations
> >>>> and stores data finally into RDBMS.
> >>>>
> >>>> Flow works fine (most of the time). It means that new HBase tables are
> created, data is loaded and can be read after that during the next M/R job
> >>>>
> >>>> After flow finishes , data from tables (but not tables itself),
> sometimes, mysteriously disappear. This is not deterministic and to get data
> back we need to RESTART HBase cluster.
> >>>> So HBase restart fixes the problem.
> >>>>
> >>>> Cluster is small (3 servers). RAM is limited - 8GB. Only 2 CPU cores
> per server but input data size is small as well and the average size of
> disappearing tables is several 1000s rows-
> >>>> they are small. Hadoop is from CHD2. I can not get you any additional
> helpful information at the time (no log files), but may be somebody has
> encountered this
> >>>> before and has idea how to fix it.
> >>>>
> >>>>
> >>>> Best regards,
> >>>> Vladimir Rodionov
> >>>> Principal Platform Engineer
> >>>> Carrier IQ, www.carrieriq.com
> >>>> e-mail: vrodionov@carrieriq.com
> >>>>
> >>>
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message