Return-Path: Delivered-To: apmail-lucene-lucy-dev-archive@minotaur.apache.org Received: (qmail 90425 invoked from network); 31 Mar 2009 03:56:06 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 31 Mar 2009 03:56:06 -0000 Received: (qmail 3146 invoked by uid 500); 31 Mar 2009 03:56:06 -0000 Delivered-To: apmail-lucene-lucy-dev-archive@lucene.apache.org Received: (qmail 3086 invoked by uid 500); 31 Mar 2009 03:56:06 -0000 Mailing-List: contact lucy-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: lucy-dev@lucene.apache.org Delivered-To: mailing list lucy-dev@lucene.apache.org Received: (qmail 3076 invoked by uid 99); 31 Mar 2009 03:56:06 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 31 Mar 2009 03:56:06 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [68.116.38.202] (HELO rectangular.com) (68.116.38.202) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 31 Mar 2009 03:55:58 +0000 Received: from marvin by rectangular.com with local (Exim 4.63) (envelope-from ) id 1LoV4a-0003PY-GB for lucy-dev@lucene.apache.org; Mon, 30 Mar 2009 20:55:36 -0700 Date: Mon, 30 Mar 2009 20:55:36 -0700 To: lucy-dev@lucene.apache.org Subject: Re: threads Message-ID: <20090331035536.GA12844@rectangular.com> References: <9ac0c6aa0903270528k689ff1c5n1ddc557eb7a171fa@mail.gmail.com> <20090329011117.GA28601@rectangular.com> <9ac0c6aa0903290621o3f904e8ct276a45acb54dfd57@mail.gmail.com> <20090329232903.GA31241@rectangular.com> <9ac0c6aa0903301508x4e367d2m7daad032573eadac@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <9ac0c6aa0903301508x4e367d2m7daad032573eadac@mail.gmail.com> User-Agent: Mutt/1.5.13 (2006-08-11) From: Marvin Humphrey X-Virus-Checked: Checked by ClamAV on apache.org On Mon, Mar 30, 2009 at 06:08:28PM -0400, Michael McCandless wrote: > > �* The VTable_registry Hash which is used to associate VTables with class > > � �names is stateful. > > What is this hash used for? Does it map name -> VTable? Yes. It is used to support dynamic subclassing, and also to support deserialization (when the appopriate deserialization function must be selected based on the class name). > It seems like most references within core could be done by the > compiler/linker? I think you could say that most references within core rely on referring to the VTable structs generated at compile-time by Boilerplater. vvvv if (!OBJ_IS_A(obj, HASH)) { THROW("That's not a Hash, it's a %o", Obj_Get_Class_Name(obj)); } > And then when host needs to refer to an object inside the core, > shouldn't the bindings be exposed however the host would do it (eg as > a Python module(s) with bindings), also compiled/linked statically. I don't think I follow that... Are you saying that dynamic subclassing would never be needed under Python? > > The VTables themselves are stateful because they are refcounted. > > Furthermore, dynamically created VTables are reclaimed once the last object > > which needs them goes away -- so the "MyScorer" VTable at one point in the > > program might not be the same as the "MyScorer" VTable at another point in > > the program. > > This is neat: so you can make a new subclass at runtime, instantiate a > bunch of objects off of it, and once all objects are gone, and nobody > directly refers to the subclass (vtable). Can you define new vtables > from the host language? Yes, transparently, in Perl at least. Standard Perl subclassing techniques work. package MyObj; use base qw( Lucy::Obj ); package main; my $obj = MyObj->new; # MyObj vtable created via inherited constructor. undef $obj; # Triggers DESTROY. Last ref to dynamic VTable for # MyObj disappears, so it gets reclaimed. At least that's how things work now. To make things thread-safe, we'll create the VTable for "MyObj" dynamically, but once created it will never go away. > > However, if we stop refcounting VTables (by making Inc_RefCount and > > Dec_RefCount no-ops) and accept that dynamically created VTables will leak, > > then those concerns go away. �I think this would be a reasonable approach. > > The leak ain't gonna matter unless somebody does something psycho like cycle > > through lots of unique class names. > > That seems tentatively OK. What do you see dynamic vtables being used > for in Lucy? They are required for subclassing in Perl, at least. > So then we don't have to worry about individual objects being thread > safe as long as we can ensure threads never share objects. Yes. :) [Pending resolution of issues around remaining shared globals.] > It might get weird if the host also doesn't maintain separate worlds, > eg the host can think it created object X in Lucy but then if it > checks back later and gets the wrong world, object X is gone. I don't understand how this could happen, so perhaps I'm not following. > > Then there's search-time, where the multi-world approach isn't always > > adequate. �Here, things get tricky because we have to worry about the > > statefulness of objects that could be operating in multiple search threads. > > > > I think we'll be generally OK if we do most of our scoring at the segment > > level. �Individual SegReaders don't share much within a larger PolyReader, so > > Scorers derived from those SegReaders won't either. > > OK even for concurrency within a single search, it sounds like? All of the core Scorers can be made thread-safe for single-search concurrency. We just need to avoid using stateful objects which are shared by multiple SegReaders. The list of those is reasonably small. Individual DataReaders (PostingsReader, LexReader, DocReader, DeletionsReader, etc) might or might not be stateful, but not in ways that are shared (other than refcounting). For instance, consider a PostingsReader. It holds open at least one InStream which is stateful on 32-bit systems because of its sliding window. It would not be safe to share that InStream across multiple threads -- but there's no reason that InStream would ever be shared across threads if we are scoring per-segment, and for that matter, actually cloning the InStream whenever we spawn a PostingList object. That PostingsReader also holds a reference to a shared Schema instance, which may have Analyzers that aren't thread safe. For that matter, Schema itself technically isn't thread safe, because you might call Schema_Spec_Field() at any moment. However, just because you *can* use those shared elements in a non-thread-safe way doesn't mean there's a reason you would do so. No core Scorer would, at least. > > Then there's refcounting. �For most objects operating in a per-segment scoring > > thread, it wouldn't be necessary to synchronize on Inc_RefCount and > > Dec_RefCount, but I don't see how we'd be able to pick and choose our sync > > points. > > It sounds like we either 1) try to guarantee threads won't share > objects, in which case you don't need to lock in the object nor in > incref/decref, ... good enough for indexing... > or 2) allow for the possibility of sharing. ... required for searching. > You could use atomic increment/decrement, though I'm unsure how costly > those really are at the CPU level. For C under pthreads, where the refcount will presumably be a plain old integer, that's probably what we want. Marvin Humphrey