Return-Path: X-Original-To: apmail-directory-dev-archive@www.apache.org Delivered-To: apmail-directory-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 23E6BC2EA for ; Thu, 10 May 2012 17:25:26 +0000 (UTC) Received: (qmail 490 invoked by uid 500); 10 May 2012 17:25:26 -0000 Delivered-To: apmail-directory-dev-archive@directory.apache.org Received: (qmail 449 invoked by uid 500); 10 May 2012 17:25:25 -0000 Mailing-List: contact dev-help@directory.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Apache Directory Developers List" Delivered-To: mailing list dev@directory.apache.org Received: (qmail 442 invoked by uid 99); 10 May 2012 17:25:25 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 May 2012 17:25:25 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of akarasulu@gmail.com designates 209.85.212.172 as permitted sender) Received: from [209.85.212.172] (HELO mail-wi0-f172.google.com) (209.85.212.172) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 May 2012 17:25:20 +0000 Received: by wibhr2 with SMTP id hr2so858002wib.1 for ; Thu, 10 May 2012 10:24:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type; bh=9fh3p9YNIuuQajP6uzdJtyly3qYFElBqbgFBckJmnMM=; b=G5OrZrf/FcaLS7yC+HbpSKhb9RBKT1wntjii+osqy585bhspZeoovAn9PGKu829xWw B6X8dVe0ebpf7emREUd3e4hXYkD1dU8P73fFDDNq+PLeb81basiRf8NMBpc5O2XWr7so HXhD5bZMdHPGIQJFcIEYuqNDBj/vRmso2j8kXsBhlpmDcIsjjOYJCEOJQyGkBr0x+xP1 Xzuxij4Dc9x+FtqOBGEITyVWoySZ/kTWQFrx4JCvgHfqqer4l5xKrt1fvRImJt96CKlQ MUP53VWDhstGfT0URqsGOyVjbQxaZbsyz5SPZS3CCoBX2LJ6Ege4FQOPLYnCP5AG0x6i qvMQ== MIME-Version: 1.0 Received: by 10.180.100.230 with SMTP id fb6mr12387209wib.3.1336670699097; Thu, 10 May 2012 10:24:59 -0700 (PDT) Sender: akarasulu@gmail.com Received: by 10.180.87.99 with HTTP; Thu, 10 May 2012 10:24:59 -0700 (PDT) In-Reply-To: References: <4FA4FDBE.9010901@gmail.com> <4FA515C7.1060805@gmail.com> <4FA64D59.5040902@gmail.com> <4FA68A21.6080301@gmail.com> <4FA93071.40805@gmail.com> <4FA9753A.8070208@gmail.com> <4FA97E6D.2090504@gmail.com> <4FA9F6F3.3040707@gmail.com> <4FAA8E9F.9060709@gmail.com> <4FAAAC14.4030706@gmail.com> <4FAAFBC2.5030703@gmail.com> <4FAB751D.9010509@gmail.com> <4FABB9E8.1050803@gmail.com> Date: Thu, 10 May 2012 20:24:59 +0300 X-Google-Sender-Auth: G0mBjhnSmLvNZvqbEohkO9ClbBk Message-ID: Subject: Re: Release troubles and failing tests From: Alex Karasulu To: Apache Directory Developers List Content-Type: multipart/alternative; boundary=f46d041824ee338c9a04bfb1e652 X-Virus-Checked: Checked by ClamAV on apache.org --f46d041824ee338c9a04bfb1e652 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I am in agreement with Selcuk's analysis. I did not presume just how nasty the inconsistency handling would get. On Thu, May 10, 2012 at 8:18 PM, Selcuk AYA wrote: > On Thu, May 10, 2012 at 5:51 AM, Emmanuel L=E9charny > wrote: > > Le 5/10/12 9:58 AM, Emmanuel L=E9charny a =E9crit : > > > >> Le 5/10/12 7:57 AM, Selcuk AYA a =E9crit : > >>> > >>> The problem seems to be caused by the test > >>> testPagedSearchWrongCookie(). This tests failure in pages search by > >>> sending a bad cookie. After failing, it relies on ctx.close() to > >>> cleanup the session. Cleanup of the session will close all the cursor= s > >>> related to paged searches through the session. > >>> > >>> It seems that somehow ctx.close does not result in an unbind message > >>> at the server side time to time. I do not know what causes this but > >>> this leaves a cursor open(specifically a NoDups cursor on rdn index). > >>> Eventually as changes happen to the Rdn index, we run out of freeable > >>> cache headers. After ignoring this test, pagedsearchit and searchit > >>> pass fine together. It would be good to understand why arrival of > >>> unbind message is a hit and miss case in this test. > >> > >> > >> It's absolutly strange... Neither an UnbindRequest nor an AbandonReque= st > >> is sent by JNDI when closing the context, which is a huge bug. > >> > >> I have checked the other tests, and an Ubind request is always sent wh= en > >> we close teh context, except when we get an UnwillingToPerform > exception. > >> It seems like the context is in a state where it considers that no > unbind > >> should be send after an exception. Although I can do a lookup (and get > back > >> the correct response from the server after this excption), the > connection is > >> still borked :/ > >> > >> I'll try to rewite the test using our API to see if it works better, a= nd > >> investigate with som Sun guys to see if there is an issue in JNDI. > >> > >> > >> > > Ok, we have had a long discussion with Alex about this problem... > > > > The thing is that even for standard PagedSearch, where everything goes > fine > > (ie, when the client is done, he has correctly closed the connextion, > which > > sends a UbindRequest, which close the cursor etc), we may have dozens o= f > > opened cursors for some extend period of time. > > > > At some point, we may have a exhausted cache, with no way to evict any > > elements from it, leading to a server freeze. > > > > Not something we can accept from a LDAP server... > > > > A suggestion would be to add some parameter in the OperationContext > telling > > the underlying layer that a search is done outside of any transaction. > When > > we fetch an ID from an index, and we try to get the associated Entry fr= om > > the master table, if we get an error because the ID does not exist > anymore, > > then we should just ignore the error, and continue the search. > > > > But we still want to be sure that in some case, inside the server, we > still > > can have transactions over some searches. > > > > Thoughts ? > > > > I dont think having non transactional search is a good idea. I agree > there is a problem with non closed cursors but I dont think this is > the right way to solve it. We currently do not have transactions for > the search but a cursor over the jdbm B tree gets a snapshot view. > This snapshot view is not only for getting a snapshot view of the data > but also the structure itself. If you do not have this(and on top of > this if you dont have txns): > > - you will have to deal with inconsistencies in the Btree data structure > - you might get data as NULL from the Btree and you might have to > deal with it. Or you might have to deal with cases like you counted 10 > children but you actually end up with 9 children while doing a DFS > search over your data structure.This might look easy but I think it is > not. > - you might get not only stale data but complete garbage. This > garbage might confuse the code completely(for example if the garbage > you read was supposed to be a Btree redirect). > > Code from ldap protocol handlers down to search is written in a way > assuming cursors get consistent data. I dont think it is impossible to > write code expecting all kinds of inconsistencies but it is very > difficult and the code will be brittle. > > > As for the paged search, one way to deal with it would be to read all > the data from the cursors at the beginning of the paged search and > close the cursor. This would be similar to a normal search. If we get > worried about memory consumption of this, the entries to be returned > could be spilled over to temp files.You might say this might lead to > temp file that are never claimed but if there are not many of them > then no big deal. Users are supposed to deal with cleaning up their > contexts. Not doing is similar to opening file handles or socket > connections and never closing them. Such things are bound to create > problems. > > > > > > > > -- > > Regards, > > Cordialement, > > Emmanuel L=E9charny > > www.iktek.com > > > > thanks > Selcuk > --=20 Best Regards, -- Alex --f46d041824ee338c9a04bfb1e652 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I am in agreement with Selcuk's analysis. I did not presume just how na= sty the inconsistency handling would get.

On Thu, May 10, 2012 at 8:18 PM, Selcuk AYA <ayaselcuk@gmail.com&g= t; wrote:
On T= hu, May 10, 2012 at 5:51 AM, Emmanuel L=E9charny <elecharny@gmail.com> wrote:
> Le 5/10/12 9:58 AM, Emmanuel L=E9charny a =E9crit :
>
>> Le 5/10/12 7:57 AM, Selcuk AYA a =E9crit :
>>>
>>> The problem seems to be caused by the test
>>> testPagedSearchWrongCookie(). This tests failure in pages sear= ch by
>>> sending a bad cookie. After failing, it relies on ctx.close() = to
>>> cleanup the session. Cleanup of the session will close all the= cursors
>>> related to paged searches through the session.
>>>
>>> It seems that somehow ctx.close does not result in an unbind m= essage
>>> at the server side time to time. I do not know what causes thi= s but
>>> this leaves a cursor open(specifically a NoDups cursor on rdn = index).
>>> Eventually as changes happen to the Rdn index, we run out of f= reeable
>>> cache headers. After ignoring this test, pagedsearchit and sea= rchit
>>> pass fine together. It would be good to understand why arrival= of
>>> unbind message is a hit and miss case in this test.
>>
>>
>> It's absolutly strange... Neither an UnbindRequest nor an Aban= donRequest
>> is sent by JNDI when closing the context, which is a huge bug.
>>
>> I have checked the other tests, and an Ubind request is always sen= t when
>> we close teh context, except when we get an UnwillingToPerform exc= eption.
>> It seems like the context is in a state where it considers that no= unbind
>> should be send after an exception. Although I can do a lookup (and= get back
>> the correct response from the server after this excption), the con= nection is
>> still borked :/
>>
>> I'll try to rewite the test using our API to see if it works b= etter, and
>> investigate with som Sun guys to see if there is an issue in JNDI.=
>>
>>
>>
> Ok, we have had a long discussion with Alex about this problem...
>
> The thing is that even for standard PagedSearch, where everything goes= fine
> (ie, when the client is done, he has correctly closed the connextion, = which
> sends a UbindRequest, which close the cursor etc), we may have dozens = of
> opened cursors for some extend period of time.
>
> At some point, we may have a exhausted cache, with no way to evict any=
> elements from it, leading to a server freeze.
>
> Not something we can accept from a LDAP server...
>
> A suggestion would be to add some parameter in the OperationContext te= lling
> the underlying layer that a search is done outside of any transaction.= When
> we fetch an ID from an index, and we try to get the associated Entry f= rom
> the master table, if we get an error =A0because the ID does not exist = anymore,
> then we should just ignore the error, and continue the search.
>
> But we still want to be sure that in some case, inside the server, we = still
> can have transactions over some searches.
>
> Thoughts ?
>

I dont think having non transactional search is a good idea. I = agree
there is a problem with non closed cursors but I dont think this is
the right way to solve it. We currently do not have transactions for
the search but a cursor over the jdbm =A0B tree gets a snapshot view.
This snapshot view is not only for getting a snapshot view of the data
but also the structure itself. If you do not have this(and on top of
this if you dont have txns):

=A0- you will have to deal with inconsistencies in the Btree data structure=
=A0- you might get data as NULL from the Btree and you might have to
deal with it. Or you might have to deal with cases like you counted 10
children but you actually end up with 9 children while doing a DFS
search over your data structure.This might look easy but I think it is
not.
=A0- you might get not only stale data but complete garbage. This
garbage might confuse the code completely(for example if the garbage
you read was supposed to be a Btree redirect).

Code from ldap protocol handlers down to search is written in a way
assuming cursors get consistent data. I dont think it is impossible to
write code expecting all kinds of inconsistencies but it is very
difficult and the code will be brittle.


As for the paged search, one way to deal with it would be to read all
the data from the cursors at the beginning of the paged search and
close the cursor. This would be similar to a normal search. If we get
worried about memory consumption of this, the entries to be returned
could be spilled over to temp files.You might say this might lead to
temp file that are never claimed but if there are not many of them
then no big deal. Users are supposed to deal with cleaning up their
contexts. Not doing is similar to opening file handles or socket
connections and never closing them. Such things are bound to create
problems.


>
>
> --
> Regards,
> Cordialement,
> Emmanuel L=E9charny
> www.iktek.com >

thanks
Selcuk



--
Best Regards,
-- Alex

--f46d041824ee338c9a04bfb1e652--