Mailing-List: contact river-dev-help@incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: river-dev@incubator.apache.org
Received-SPF: pass (athena.apache.org: domain of
 SRS0=Ynfsd/=TQ=wonderly.org=gregg@yourhostingaccount.com designates
 65.254.253.75 as permitted sender)
Message-ID: <4D0BAA80.9010709@wonderly.org>
Date: Fri, 17 Dec 2010 12:22:56 -0600
From: Gregg Wonderly <gregg@wonderly.org>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US;
 rv:1.9.2.12) Gecko/20101027 Thunderbird/3.1.6
MIME-Version: 1.0
To: river-dev@incubator.apache.org
CC: Patricia Shanahan <pats@acm.org>
Subject: Re: datastructure classes
References: <4CF89BD3.3030103@acm.org>	<4CF909A0.5070409@wonderly.org>
	<4D056E41.7020102@acm.org>	<4D057B73.3090905@zeus.net.au>
	<4D05BA22.4000002@acm.org>	<4D06672A.10203@wonderly.org>
	<4D066FA2.8050207@wonderly.org>	<4D069D21.5070908@zeus.net.au>
	<4D06BA8D.4040602@simulexinc.com>	<4D06C523.5000006@acm.org>
	<5E44AB11-EC78-476E-8A6D-C9AEBEFCA3D2@topiatechnology.com>
	<4D079D60.4070607@wonderly.org>	<4D07E442.5040908@acm.org>
	<39AED645-40CC-4314-9956-25BE1325B2F5@topiatechnology.com>
	<4D07E9D4.2080106@acm.org>
	<B9558304-8B77-4CFB-B23B-73A29D89B65F@topiatechnology.com>
	<4D0856B2.1070107@acm.org>
	<DD8D109E-0158-438A-ADB0-F8180C8B01A8@topiatechnology.com>
	<4D0A1A61.2020805@acm.org>
	<69B415E2-908A-4419-A1D9-A71A2D7893EF@topiatechnology.com>
	<397682.52410.qm@web33804.mail.mud.yahoo.com>
 <AANLkTimFRyFZWySznKRRN8rrUoxzL1q9Be6ns7JBaArw@mail.gmail.com>
 <4D0ADA8E.4050002@acm.org>
In-Reply-To: <4D0ADA8E.4050002@acm.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Sender: Gregg Wonderly <gregg@wonderly.org>

On 12/16/2010 9:35 PM, Patricia Shanahan wrote:
> I would love to be able to base it on a ConcurrentMap, but I don't see how it
> would work.
>
> As far as I can tell, there is no clean split between key and non-key among the
> entry's public fields. A template can specify values for any subset of the
> fields, and different templates used in reading from the same space may choose
> different sets of fields.
>
> Any ideas how to work around that?

One of the things I did in Griddle was to create a CHM for each key field name. 
  I then put every entry into the appropriate map based on it having a key value 
pair with a name.

Matching is then a matter of querying each map with the provided key's value, 
and only if the same entry exists in all tables does a match begin processing.

key->map->put(entry,entry).

Thus an entry is keyed based on its uniqueness as an object.  There is not, 
currently, a list of objects, but instead a map so that remove is easy, and does 
not, in general, interfere with reads.

This, ultimately, could result in items not being processed in a high bandwidth 
environment due to how iteratation might work out.  So, I need to do something 
different like use a skip-list or some other queue like mechanism which will 
make sure entries eventually are processed.  A subtle aside, is that since the 
objects are the keys themselves, that to some degree there is an eventual 
completion possible because at some point the "hash" will be ordered to expose 
all entries.

> We can't just use a separate ConcurrentMap for each field, and take the
> intersection of results, because a map requires unique keys. A JavaSpace may
> have many entries with the same value of a given field.

The map described above is about compartmentalization of the entries into "sets" 
that would be most likely to then be matched.  In Outrigger, this might get you 
to a FastList of entries that all expose a value for the same key.  You would 
then need to scan them for the matching value.  From a scale perspective, the 
number of "in transit" objects impacts linear scanning ETA the most.  Heavily 
loaded spaces (ones with slower transaction/transition speeds) will have more 
latency added to the overall throughput of a single operation.

Javaspaces doesn't unmarshal entry data to avoid code downloading.  It demands 
keys be Object types so that a MarshalledInstance can be created in the client 
when the EntryRep is created.  Then, the comparison is about the Entry 
serializing to the same form as expressed in the EntryRep internally.

EntryRep's MarshalledInstance would be exactly the right key to use for further 
compartmentalization to, for example put a list of EntryRep objects with exactly 
the same value.  The simple side issue is that EntryRep doesn't store field 
names.  It only stores values in a array which is ordered based on the order of 
the fields returned from reflection.  So some work would be needed to actually 
have the name of the field in the server.

Let me talk a little bit more about Griddle's TemplateEntry interface shown here:

public interface TemplateEntry<K extends Serializable> extends Serializable {
	// Called with the entry to see if you match.  The called entry will
	// in general be a 'wider' view of the match space than the passed
	// entry.  I.e. ent may have getKeys() return 10 keys, while the
	// called entry may be looking for matches with 3 keys.
	public boolean matches( TemplateEntry<K> ent );
	public List<? extends K> getKeys();
}

This is everything about what Griddle uses for storing and matching.  It leaves 
it up to the implementation to manage the marshal/unmarshal of the entry's 
value.  The matches() method is what is used to see if an entry matches another. 
  When you create a query, you need to decide what getKeys() returns because 
that controls which entries matches() is called with.

So, for example, if you have a basic value object already in your codebase, 
you'd tack on behavior to be a TemplateEntry by adding an interface, but with 
more knowledge of the implementation you'd perhaps not need the interface.

public interface KeyProvider<K,T> {
	public List<K>allKeyNames();
	public Map<K,T> valueMap();
}

public class MyBackingObject implements KeyProvider<String,Object> {
	// This is the object that you want to fly around between clients using
	// the space (griddle)
}

Implementation of the Interface would expose the things that you'd want to be 
considered for matching in the space (griddle).

public class MyTemplateEntry<T extends MyBackingObject>
		 implements TemplateEntry<String> {

	// we marshal the value to carry it to the space.
	private MarshalledObject<T> val;
	// we provide a set of String based key names
	private List<String> keys;
	// We provide the values for those keys as
	// Serializable values.  You do not want to
	// use "random" types for values, only something
	// that the server will have no problem having a
	// class definition for without downloading code, unless
	// that is what you need, and then you just have to
	// think about the impact that has on the server's
	// lifecycle if you need to change the definition
	// of that class.
	private Map<String,Serializable> keyVals;

	public MyTemplateEntry( T value ) {
		val = new MarshalledObject<T>( value );
		// these are separated so that null values
		// can be part of what matches.  Otherwise
		// keys could just be valueMap().keySet().
		keys = value.allKeyNames();
		keyVals = value.valueMap();
	}

	public boolean matches( TemplateEntry<String> ent ) {
		// A more concrete type would not require this
		// check.
		if( ent instanceof MyTemplateEntry == false )
			return false;
		MyTemplateEntry<T extends MyBackingObject> mte =
			(MyTemplateEntry<T extends MyBackingObject>)ent;
		// get the keys for the entry that we are going to see
		// if we match.
		List<String>tkeys = ent.getKeys();
		for( String k : keys ) {
			// we need to find that the passed entry provides
			// a value for the same key.
			boolean found = false;
			for( String tk : tkeys ) {
				if( k.equals(tk) ) {
					// we want to check this value.
					// (need better logic here if we
					// want null == null to work.
					if( keyVals.get(k).equals(
						mte.keyVals.get(k) ) {
						found = true;
					}
				}
			}
			if( !found ) {
				// this entry does not provide a matching
				// value for the key we have.
				return false;
			}
		}
	}
	return true;
}

With all of that in place, a query might then either be a subclass, or it might 
just need to be created using a separate constructor of the base entry class.

public class OwnerMyTemplateEntryQuery extends
		 MyTemplateEntry<KeyProvider<String,Object>> {
	public List<String> getKeys() {
		return Arrays.asList( "first", "last" );
	}
}

The end result I was aiming for, was unlimited matching logic, 
minimization/elimination of downloaded code being possible, and separation of 
keys and the "value" so that anything serializable could be carried around as 
the value, and key/value pairs could be derived or otherwise managed separately.

With outrigger, I think that some of the "not visible" data could be more 
readily included in the EntryRep to make some of the activities of the 
storage/matching easier to do while eliminating some contention by further 
compartmentalization of the data.

Gregg Wonderly