accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Newton <eric.new...@gmail.com>
Subject Re: Failing to BulkIngest [SEC=UNOFFICIAL]
Date Fri, 21 Feb 2014 03:26:42 GMT
Sorry... I should have been more clear.

"-e" is for ephemeral, these are not ephemeral nodes. I think "-s" is the
default, so you don't need to specify it.

You can put anything in for the data.. it is unimportant:

cli>  create /accumulo/xx.../fate foo
cli>  create /accumulo/xx.../table_locks bar

I think that you can give the zkCli.sh shell quotes for an empty string:

cli> create /accumulo/xx.../fate ""

But, I can't remember if that works.  Accumulo never reads the contents of
those nodes, so anything you put in there will be ignored.

The master may even re-create these nodes on start-up, but I did not test
it.

-Eric



On Thu, Feb 20, 2014 at 6:18 PM, Dickson, Matt MR <
matt.dickson@defence.gov.au> wrote:

>  *UNOFFICIAL*
> After running the zkCli.sh rmr on the directories, we are
> having difficulties recreating the nodes.
>
> The zookeeper create command has 2 options -s and -e, but it's not clear
> what each of these does and which one to use to recreate the accumulo
> node.  Also the create command requires a 'data' name specified however
> when we look at our qa system the accumulo node has no data name within it.
>
> What is the zookeper command to run to recreate the /accumulo/xx.../fate
> and /accumulo/xx.../table_locks nodes?
>
>  ------------------------------
> *From:* Eric Newton [mailto:eric.newton@gmail.com]
> *Sent:* Friday, 21 February 2014 07:31
>
> *To:* user@accumulo.apache.org
> *Subject:* Re: Failing to BulkIngest [SEC=UNOFFICIAL]
>
>  No, xxx... is your instance id.  You can find it at the top of the
> monitor page. It's the ugly UUID there.
>
> -Eric
>
>
>
> On Thu, Feb 20, 2014 at 3:26 PM, Dickson, Matt MR <
> matt.dickson@defence.gov.au> wrote:
>
>>  *UNOFFICIAL*
>> Is the xxx... the transaction id returned by the 'fate.Admin print'?
>>
>> Whats involved with recreating a node?
>>
>> Matt
>>
>>  ------------------------------
>>  *From:* Eric Newton [mailto:eric.newton@gmail.com]
>> *Sent:* Friday, 21 February 2014 01:35
>>
>> *To:* user@accumulo.apache.org
>> *Subject:* Re: Failing to BulkIngest [SEC=UNOFFICIAL]
>>
>>   You can use the zkCli.sh utility to "rmr" /accumulo/xx.../fate and
>> /accumulo/xx.../table_locks, and then recreate those nodes.
>>
>> -Eric
>>
>>
>>
>> On Wed, Feb 19, 2014 at 5:58 PM, Dickson, Matt MR <
>> matt.dickson@defence.gov.au> wrote:
>>
>>>  *UNOFFICIAL*
>>> Thanks for your help on this Eric.
>>>
>>> I've started deleting the transactions by running the, ./accumulo
>>> ...fate.Admin delete <txid>, and notice this takes about 20 seconds per
>>> transaction.  With 7500 to delete this is going to take a long time (almost
>>> 2 days), so I tried running several threads each with a seperate range of
>>> id's to delete.  Unfortunately this seemed to have some contention and I
>>> kept recieving an InvocationTargetException .... Caused by
>>> zookeeper.KeeperException: KeeperErrorCode = noNode for
>>> /accumulo/xxxxx-xxxx-xxxx-xxxx/table_locks/3n/lock-xxxxxx
>>>
>>> When I go back to one thread this error disappears.
>>>
>>> Is there a better way to run this?
>>>
>>> Thanks in advance,
>>> Matt
>>>
>>>  ------------------------------
>>> *From:* Eric Newton [mailto:eric.newton@gmail.com]
>>> *Sent:* Wednesday, 19 February 2014 01:21
>>>
>>> *To:* user@accumulo.apache.org
>>> *Subject:* Re: Failing to BulkIngest [SEC=UNOFFICIAL]
>>>
>>>   The "LeaseExpiredException" is part of the recovery process.  The
>>> master determines that a tablet server has lost its lock, or it is
>>> unresponsive and has been halted, possibly indirectly by removing the lock.
>>>
>>> The master then steals the write lease on the WAL file, which causes
>>> future writes to the WALog to fail.  The message you have seen is part of
>>> that failure.  You should have seen a tablet server failure associated with
>>> this message on the machine with <ip>.
>>>
>>> Having 50K FATE IN_PROGRESS lines is bad.  That is preventing your bulk
>>> imports from getting run.
>>>
>>> Are there any lines that show locked: [W:3n] ?  The other FATE
>>> transactions are waiting to get a READ lock on table id 3n.
>>>
>>> -Eric
>>>
>>>
>>>
>>> On Sun, Feb 16, 2014 at 7:59 PM, Dickson, Matt MR <
>>> matt.dickson@defence.gov.au> wrote:
>>>
>>>> UNOFFICIAL
>>>>
>>>> Josh,
>>>>
>>>> Zookeepr - 3.4.5-cdh4.3.0
>>>> Accumulo - 1.5.0
>>>> Hadoop - cdh 4.3.0
>>>>
>>>> In the accumulo console getting
>>>>
>>>> ERROR RemoteException(...LeaseExpiredException): Lease mismatch on
>>>> /accumulo/wal/<ip>+9997/<uid> owned by DFSClient_NONMAPREDUCE_699577321_12
>>>> but is accessed by DFSClient_NONMAPREDUCE_903051502_12
>>>>
>>>> We can scan the table without issues and can load rows directly, ie not
>>>> using bulk import.
>>>>
>>>> A bit more information - we recently extended how we manage old tablets
>>>> in the system. We load data by date, creating splits for each day and then
>>>> ageoff using the ageoff filters.  This leaves empty tablets so we now merge
>>>> these old tablets together to effectively remove them.  I mention it
>>>> because I'm not sure if this might have introduced another issue.
>>>>
>>>> Matt
>>>>
>>>> -----Original Message-----
>>>> From: Josh Elser [mailto:josh.elser@gmail.com]
>>>> Sent: Monday, 17 February 2014 11:32
>>>> To: user@accumulo.apache.org
>>>> Subject: Re: Failing to BulkIngest [SEC=UNOFFICIAL]
>>>>
>>>> Matt,
>>>>
>>>> Can you provide Hadoop, ZK and Accumulo versions? Does the cluster
>>>> appear to be functional otherwise (can you scan that table you're bulk
>>>> importing to? any other errors on the monitor? etc)
>>>>
>>>> On 2/16/14, 7:07 PM, Dickson, Matt MR wrote:
>>>> > *UNOFFICIAL*
>>>> >
>>>> > I have a situation where bulk ingests are failing with a "Thread
>>>> "shell"
>>>> > stuck on IO to xxx:9999:99999 ...
>>>> >  From the management console the table we are loading to has no
>>>> > compactions running, yet we ran "./accumulo
>>>> > org.apache.accumulo.server.fate.Admin print and can see 50,000 lines
>>>> > stating
>>>> > txid: xxxx     status:IN_PROGRESS op: CompactRange     locked: []
>>>> > locking: [R:3n]     top: Compact:Range
>>>> > Does this mean there are actually compactions running or old
>>>> > comapaction locks still hanging around that will be preventing the
>>>> builk ingest to run?
>>>> > Thanks in advance,
>>>> > Matt
>>>>
>>>
>>>
>>
>

Mime
View raw message