accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Newton <eric.new...@gmail.com>
Subject Re: Failing to BulkIngest [SEC=UNOFFICIAL]
Date Tue, 18 Feb 2014 14:20:45 GMT
The "LeaseExpiredException" is part of the recovery process.  The master
determines that a tablet server has lost its lock, or it is unresponsive
and has been halted, possibly indirectly by removing the lock.

The master then steals the write lease on the WAL file, which causes future
writes to the WALog to fail.  The message you have seen is part of that
failure.  You should have seen a tablet server failure associated with this
message on the machine with <ip>.

Having 50K FATE IN_PROGRESS lines is bad.  That is preventing your bulk
imports from getting run.

Are there any lines that show locked: [W:3n] ?  The other FATE transactions
are waiting to get a READ lock on table id 3n.

-Eric



On Sun, Feb 16, 2014 at 7:59 PM, Dickson, Matt MR <
matt.dickson@defence.gov.au> wrote:

> UNOFFICIAL
>
> Josh,
>
> Zookeepr - 3.4.5-cdh4.3.0
> Accumulo - 1.5.0
> Hadoop - cdh 4.3.0
>
> In the accumulo console getting
>
> ERROR RemoteException(...LeaseExpiredException): Lease mismatch on
> /accumulo/wal/<ip>+9997/<uid> owned by DFSClient_NONMAPREDUCE_699577321_12
> but is accessed by DFSClient_NONMAPREDUCE_903051502_12
>
> We can scan the table without issues and can load rows directly, ie not
> using bulk import.
>
> A bit more information - we recently extended how we manage old tablets in
> the system. We load data by date, creating splits for each day and then
> ageoff using the ageoff filters.  This leaves empty tablets so we now merge
> these old tablets together to effectively remove them.  I mention it
> because I'm not sure if this might have introduced another issue.
>
> Matt
>
> -----Original Message-----
> From: Josh Elser [mailto:josh.elser@gmail.com]
> Sent: Monday, 17 February 2014 11:32
> To: user@accumulo.apache.org
> Subject: Re: Failing to BulkIngest [SEC=UNOFFICIAL]
>
> Matt,
>
> Can you provide Hadoop, ZK and Accumulo versions? Does the cluster appear
> to be functional otherwise (can you scan that table you're bulk importing
> to? any other errors on the monitor? etc)
>
> On 2/16/14, 7:07 PM, Dickson, Matt MR wrote:
> > *UNOFFICIAL*
> >
> > I have a situation where bulk ingests are failing with a "Thread "shell"
> > stuck on IO to xxx:9999:99999 ...
> >  From the management console the table we are loading to has no
> > compactions running, yet we ran "./accumulo
> > org.apache.accumulo.server.fate.Admin print and can see 50,000 lines
> > stating
> > txid: xxxx     status:IN_PROGRESS op: CompactRange     locked: []
> > locking: [R:3n]     top: Compact:Range
> > Does this mean there are actually compactions running or old
> > comapaction locks still hanging around that will be preventing the builk
> ingest to run?
> > Thanks in advance,
> > Matt
>

Mime
View raw message