Return-Path: Delivered-To: apmail-incubator-river-dev-archive@minotaur.apache.org Received: (qmail 82559 invoked from network); 17 Dec 2010 18:23:34 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 17 Dec 2010 18:23:34 -0000 Received: (qmail 27218 invoked by uid 500); 17 Dec 2010 18:23:34 -0000 Delivered-To: apmail-incubator-river-dev-archive@incubator.apache.org Received: (qmail 27112 invoked by uid 500); 17 Dec 2010 18:23:33 -0000 Mailing-List: contact river-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: river-dev@incubator.apache.org Delivered-To: mailing list river-dev@incubator.apache.org Received: (qmail 27104 invoked by uid 99); 17 Dec 2010 18:23:33 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Dec 2010 18:23:33 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=RFC_ABUSE_POST,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of SRS0=Ynfsd/=TQ=wonderly.org=gregg@yourhostingaccount.com designates 65.254.253.75 as permitted sender) Received: from [65.254.253.75] (HELO mailout09.yourhostingaccount.com) (65.254.253.75) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Dec 2010 18:23:25 +0000 Received: from mailscan17.yourhostingaccount.com ([10.1.15.17] helo=mailscan17.yourhostingaccount.com) by mailout09.yourhostingaccount.com with esmtp (Exim) id 1PTexL-00017e-RI for river-dev@incubator.apache.org; Fri, 17 Dec 2010 13:23:03 -0500 Received: from impout02.yourhostingaccount.com ([10.1.55.2] helo=impout02.yourhostingaccount.com) by mailscan17.yourhostingaccount.com with esmtp (Exim) id 1PTexL-0007PF-G1; Fri, 17 Dec 2010 13:23:03 -0500 Received: from authsmtp11.yourhostingaccount.com ([10.1.18.11]) by impout02.yourhostingaccount.com with NO UCE id kJP31f0040EKrUA0000000; Fri, 17 Dec 2010 13:23:03 -0500 X-EN-OrigOutIP: 10.1.18.11 X-EN-IMPSID: kJP31f0040EKrUA0000000 Received: from ip70-189-103-32.ok.ok.cox.net ([70.189.103.32] helo=[192.168.1.106]) by authsmtp11.yourhostingaccount.com with esmtpsa (TLSv1:AES256-SHA:256) (Exim) id 1PTexK-00061e-Fa; Fri, 17 Dec 2010 13:23:02 -0500 Message-ID: <4D0BAA80.9010709@wonderly.org> Date: Fri, 17 Dec 2010 12:22:56 -0600 From: Gregg Wonderly User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2.12) Gecko/20101027 Thunderbird/3.1.6 MIME-Version: 1.0 To: river-dev@incubator.apache.org CC: Patricia Shanahan Subject: Re: datastructure classes References: <4CF89BD3.3030103@acm.org> <4CF909A0.5070409@wonderly.org> <4D056E41.7020102@acm.org> <4D057B73.3090905@zeus.net.au> <4D05BA22.4000002@acm.org> <4D06672A.10203@wonderly.org> <4D066FA2.8050207@wonderly.org> <4D069D21.5070908@zeus.net.au> <4D06BA8D.4040602@simulexinc.com> <4D06C523.5000006@acm.org> <5E44AB11-EC78-476E-8A6D-C9AEBEFCA3D2@topiatechnology.com> <4D079D60.4070607@wonderly.org> <4D07E442.5040908@acm.org> <39AED645-40CC-4314-9956-25BE1325B2F5@topiatechnology.com> <4D07E9D4.2080106@acm.org> <4D0856B2.1070107@acm.org> <4D0A1A61.2020805@acm.org> <69B415E2-908A-4419-A1D9-A71A2D7893EF@topiatechnology.com> <397682.52410.qm@web33804.mail.mud.yahoo.com> <4D0ADA8E.4050002@acm.org> In-Reply-To: <4D0ADA8E.4050002@acm.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-EN-UserInfo: 5bac21c6012e8295aaee92c67842fba3:d1e94006e19829b2b3cf849ab9ff0f3c X-EN-AuthUser: greggwon Sender: Gregg Wonderly X-EN-OrigIP: 70.189.103.32 X-EN-OrigHost: ip70-189-103-32.ok.ok.cox.net On 12/16/2010 9:35 PM, Patricia Shanahan wrote: > I would love to be able to base it on a ConcurrentMap, but I don't see how it > would work. > > As far as I can tell, there is no clean split between key and non-key among the > entry's public fields. A template can specify values for any subset of the > fields, and different templates used in reading from the same space may choose > different sets of fields. > > Any ideas how to work around that? One of the things I did in Griddle was to create a CHM for each key field name. I then put every entry into the appropriate map based on it having a key value pair with a name. Matching is then a matter of querying each map with the provided key's value, and only if the same entry exists in all tables does a match begin processing. key->map->put(entry,entry). Thus an entry is keyed based on its uniqueness as an object. There is not, currently, a list of objects, but instead a map so that remove is easy, and does not, in general, interfere with reads. This, ultimately, could result in items not being processed in a high bandwidth environment due to how iteratation might work out. So, I need to do something different like use a skip-list or some other queue like mechanism which will make sure entries eventually are processed. A subtle aside, is that since the objects are the keys themselves, that to some degree there is an eventual completion possible because at some point the "hash" will be ordered to expose all entries. > We can't just use a separate ConcurrentMap for each field, and take the > intersection of results, because a map requires unique keys. A JavaSpace may > have many entries with the same value of a given field. The map described above is about compartmentalization of the entries into "sets" that would be most likely to then be matched. In Outrigger, this might get you to a FastList of entries that all expose a value for the same key. You would then need to scan them for the matching value. From a scale perspective, the number of "in transit" objects impacts linear scanning ETA the most. Heavily loaded spaces (ones with slower transaction/transition speeds) will have more latency added to the overall throughput of a single operation. Javaspaces doesn't unmarshal entry data to avoid code downloading. It demands keys be Object types so that a MarshalledInstance can be created in the client when the EntryRep is created. Then, the comparison is about the Entry serializing to the same form as expressed in the EntryRep internally. EntryRep's MarshalledInstance would be exactly the right key to use for further compartmentalization to, for example put a list of EntryRep objects with exactly the same value. The simple side issue is that EntryRep doesn't store field names. It only stores values in a array which is ordered based on the order of the fields returned from reflection. So some work would be needed to actually have the name of the field in the server. Let me talk a little bit more about Griddle's TemplateEntry interface shown here: public interface TemplateEntry extends Serializable { // Called with the entry to see if you match. The called entry will // in general be a 'wider' view of the match space than the passed // entry. I.e. ent may have getKeys() return 10 keys, while the // called entry may be looking for matches with 3 keys. public boolean matches( TemplateEntry ent ); public List getKeys(); } This is everything about what Griddle uses for storing and matching. It leaves it up to the implementation to manage the marshal/unmarshal of the entry's value. The matches() method is what is used to see if an entry matches another. When you create a query, you need to decide what getKeys() returns because that controls which entries matches() is called with. So, for example, if you have a basic value object already in your codebase, you'd tack on behavior to be a TemplateEntry by adding an interface, but with more knowledge of the implementation you'd perhaps not need the interface. public interface KeyProvider { public ListallKeyNames(); public Map valueMap(); } public class MyBackingObject implements KeyProvider { // This is the object that you want to fly around between clients using // the space (griddle) } Implementation of the Interface would expose the things that you'd want to be considered for matching in the space (griddle). public class MyTemplateEntry implements TemplateEntry { // we marshal the value to carry it to the space. private MarshalledObject val; // we provide a set of String based key names private List keys; // We provide the values for those keys as // Serializable values. You do not want to // use "random" types for values, only something // that the server will have no problem having a // class definition for without downloading code, unless // that is what you need, and then you just have to // think about the impact that has on the server's // lifecycle if you need to change the definition // of that class. private Map keyVals; public MyTemplateEntry( T value ) { val = new MarshalledObject( value ); // these are separated so that null values // can be part of what matches. Otherwise // keys could just be valueMap().keySet(). keys = value.allKeyNames(); keyVals = value.valueMap(); } public boolean matches( TemplateEntry ent ) { // A more concrete type would not require this // check. if( ent instanceof MyTemplateEntry == false ) return false; MyTemplateEntry mte = (MyTemplateEntry)ent; // get the keys for the entry that we are going to see // if we match. Listtkeys = ent.getKeys(); for( String k : keys ) { // we need to find that the passed entry provides // a value for the same key. boolean found = false; for( String tk : tkeys ) { if( k.equals(tk) ) { // we want to check this value. // (need better logic here if we // want null == null to work. if( keyVals.get(k).equals( mte.keyVals.get(k) ) { found = true; } } } if( !found ) { // this entry does not provide a matching // value for the key we have. return false; } } } return true; } With all of that in place, a query might then either be a subclass, or it might just need to be created using a separate constructor of the base entry class. public class OwnerMyTemplateEntryQuery extends MyTemplateEntry> { public List getKeys() { return Arrays.asList( "first", "last" ); } } The end result I was aiming for, was unlimited matching logic, minimization/elimination of downloaded code being possible, and separation of keys and the "value" so that anything serializable could be carried around as the value, and key/value pairs could be derived or otherwise managed separately. With outrigger, I think that some of the "not visible" data could be more readily included in the EntryRep to make some of the activities of the storage/matching easier to do while eliminating some contention by further compartmentalization of the data. Gregg Wonderly