cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Mazzocchi <stef...@apache.org>
Subject Re: Announce: Momento
Date Wed, 18 Feb 2004 14:02:31 GMT
Alan wrote:

> * Stefano Mazzocchi <stefano@apache.org> [2004-02-17 04:22]:
> 
>>Alan wrote:
>>
>>
>>>Momento is a native XML persistence engine. Is supports XSLT 2.0 and
>>>   XQuery 1.0 (via Saxon), it supports XUpdate.
>>>
>>>   It is transactional and ACID.
>>>
>>>   It is designed with Cocoon in mind.
>>>
>>>   I am considering a open source path for future development.
>>>
>>>   http://engrm.com/project/com.agtrz.momento/
>>>
>>>Thoughts?
>>
>>Very interesting. Can you tell us more on how it works? have any numbers 
>>on how fast/scalable it gets? what's the difference between Momento and 
>>a native xml database like XIndice?
> 
> 
> Stefano
> 
> Thanks for asking.
> 
>     3) I can't compare Momento to Xindice at the moment. Last I looked
>     at Xindice was last November. I'm announcing to get such insight.
> 
>         (I am willing to say that Momento isn't wedded to a specific
>             API, such as XML::DB. It works with Saxon to provide
>             XQuery and XSLT. 

What is the interfacing API then? JAXP? or Saxon's?

I'm implementing XUpdate currently. A
>             read-only W3 DOM is a simple matter, if there is call
>             for it. A read-write W3 DOM is not so simple, or
>             desireable, but entirely feasable.
>             
>             I could be way off base. Let me know.

I'm not sure I get what you mean here with read/write DOM being desirable.

>             )
> 
>     2) No numbers. I've not designed any benchmarks. Momento is to the
>     point where I need an example application to focus my energy to
>     get Momento to beta. I was going to use Linotype, actually, and
>     use Momento to store my blog.

Uh, awesome!

> 
> 
>         (Linotype is a good application since I would want to backup
>             Momento after an update for the time being. That extra
>             step is acceptable for a blog, provided it is a single
>             user blog.)
>             
> 
>     Towards scalablity Momento supports concurrent reads, and, with
>     some educated decisions by the application developer, concurrent
>     updates. 

uh, this sentence is worrysome. Can you elaborate more?

>     Momento works as a data store for a multi-threaded
>     server. It would work nicely as a servlet, or in a Cocoon pipeline.
> 
>     For further scaleability, I plan on supporting multi-process
>     operation, and indicies.  (That's right, no indicies yet. They
>     are not at the core of Momento.)

ok

> 
>     1) Momento has three concepts that rise to the surface when I
>     consider it.
> 
>         * Zero, I always forget that I spent a month writing a
>           journaling file data structure. It just hums along quietly
>           in the background now. It splits a random access file into
>           pages. Reads those pages in and out of memory. It uses
>           weak references to implement a page cache. It's pretty
>           cool, but its pretty much done, so...

uh, sounds useful. We are having problems with JISP. Care to tell us 
more about this?

>         * First, Momento maintains a version axis.
>         
>           Rather than updating a node, Momento links a new version
>           of that node to its version axis. An XSLT transform
>           navigates a Momento document with a version number in
>           hand. When you get the first child, say, you check to see
>           if there is a new version of that child, and iterate, but
>           never past the maximum version.
> 
>           The older versions are kept around until any queries
>           referencing them terminate. At that point, the older
>           versions of the nodes can be collected.
> 
>           Thus a newer version of the document can be assembled
>           while the existing version is queried. That newer version
>           can even be discarded and it will be ignored (iterate
>           beyond to the next good version, or stick with the last
>           good version). Volia: commit and rollback.
> 
>           The version-axis allows for however many concurrent
>           queries, they will not have to wait for updates.

Interesting approach. I don't see how you can do rollback if you garbage 
collect a node, though.

> 
>         * Second, Momento organizes its nodes in clusters.
>         
>           An application developer tunes their application for
>           performance by specifying which nodes will to be clustered
>           on the same pages. This ought to be a obvious decision for
>           the most part. Consdier a bug database.
> 
>           <bug-document xmlns="http://engrm.com/bugs">
>             <project name="Momento">
>               <issue name="Won't do this">
>                 <comment />
>                 <comment />
>                 .
>                 .
>               </issue>
>               <issue name="Doesn't do that">
>                 <comment />
>                 <comment />
>                 .
>                 .
>               </issue>
>               .
>               .
>               .
>               .
>             </project>
>          </project>        
> 
>          In the above document, it is likely that most of the data
>          manipulation will occur within an issue. If the nodes are
>          clustered by issue, then Momento can place an read lock
>          at the issue node, allowing other issue nodes to be updated
>          concurrently.
>          
>          There will be aggreagte queries, but most queries
>          (transforms) will manipulate an issue. It makes sense then
>          to cluster the nodes in an issue on the same pages.
> 
>          It makes sense to to cluster the issue nodes themselves
>          together since the most likely axis of traversal is the the
>          next-sibling axis, used to find a specific issue by name.
> 
>          Clusters are akin to files on the file system, really.

What is the strategy you use to clusterize nodes?

> 
>        * Third, Momento maintains performance through node proximity
>          by reorganizing clusters, and performance of updates by
>          roorganinzing tclusters  as a spearate step apart from
>          updating them. 
> 
>          This is tricky to explain, not that its a complicated
>          concept, I just haven't tried to document it yet, bare
>          with..
> 
>          In its organized state, a cluster contains it's nodes in
>          document order. This puts first children on the same page
>          a parent and a next sibling on the same page a previous
>          sibling, that is, not too far away.
> 
>          When a cluster mutates, new nodes are allocated from a
>          scrap page of nodes and linked to the version axis. Now the
>          document order is wonky. The newer version will cause
>          queries to iterate into the scrap pages, proximity starts
>          to suffer.
> 
>          Therefore, as a second step to updating the document,
>          Momento must organize itself by copying the last commited
>          version of a cluster to a new set of pages, retaining only
>          the latest version of each node.
>           
>          This can occur in a separate thread, or more likely at the
>          prompting of the application developer. If a user is
>          pecking away at an interactive form, it may make sense for
>          the user to press Okay before going to the trouble to
>          organize the cluster.
> 
>          In most XML applications, there are going to be natural
>          candidates for a Momento cluster. Often this will map
>          directly to a file on the file system, in an existing
>          application.
> 
>     
>     My communication skills are getting streched by all the
>     announcing. Please let me know if this is a good explaintion.
>     I can use it to create a better overview document.

What you outlines sounds like an interesting approach but it's kinda 
foggy. A better outline document would be very useful (to me, at least) 
to understand if your approach could be used as a persistent native XML 
database that could scale.

>     I'll try not to be so prolix in the follow up. They were broad
>     questions. With these concepts, everything else in Momento is
>     obvious. It is pretty simple.
> 
> I was planning on announcing all week, but I'm traveling instead. 

Announcing all week? one announcement is enough :-) the rest can be a 
regular email exchange don't you think?

I
>     do hope to foster discussion of this project. I'll check in at
>     the airport. More questions, please.
>     
>     There is a mailing list too. Please join to participate or observe.
> 
>     momento-subscribe@engrm.com

I won't subscribe to a new mail list until I see there is a reason for 
it, and for sure not before there is any code to take a look at.

For the licensing issue, keep in mind that we wouldn't be able to 
distribute your software (due to ASF policies) if you choose a license 
of the GPL family.

> 
> Thanks again for asking.

You are welcome.

-- 
Stefano.


Mime
View raw message