cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan <alan-coc...@engrm.com>
Subject Re: Announce: Momento
Date Tue, 17 Feb 2004 06:42:28 GMT
* Stefano Mazzocchi <stefano@apache.org> [2004-02-17 04:22]:
> Alan wrote:
> 
> >Momento is a native XML persistence engine. Is supports XSLT 2.0 and
> >    XQuery 1.0 (via Saxon), it supports XUpdate.
> >
> >    It is transactional and ACID.
> >
> >    It is designed with Cocoon in mind.
> >
> >    I am considering a open source path for future development.
> >
> >    http://engrm.com/project/com.agtrz.momento/
> >
> >Thoughts?
> 
> Very interesting. Can you tell us more on how it works? have any numbers 
> on how fast/scalable it gets? what's the difference between Momento and 
> a native xml database like XIndice?

Stefano

Thanks for asking.

    3) I can't compare Momento to Xindice at the moment. Last I looked
    at Xindice was last November. I'm announcing to get such insight.

        (I am willing to say that Momento isn't wedded to a specific
            API, such as XML::DB. It works with Saxon to provide
            XQuery and XSLT. I'm implementing XUpdate currently. A
            read-only W3 DOM is a simple matter, if there is call
            for it. A read-write W3 DOM is not so simple, or
            desireable, but entirely feasable.
            
            I could be way off base. Let me know.
            )

    2) No numbers. I've not designed any benchmarks. Momento is to the
    point where I need an example application to focus my energy to
    get Momento to beta. I was going to use Linotype, actually, and
    use Momento to store my blog.


        (Linotype is a good application since I would want to backup
            Momento after an update for the time being. That extra
            step is acceptable for a blog, provided it is a single
            user blog.)
            

    Towards scalablity Momento supports concurrent reads, and, with
    some educated decisions by the application developer, concurrent
    updates. Momento works as a data store for a multi-threaded
    server. It would work nicely as a servlet, or in a Cocoon pipeline.

    For further scaleability, I plan on supporting multi-process
    operation, and indicies.  (That's right, no indicies yet. They
    are not at the core of Momento.)

    1) Momento has three concepts that rise to the surface when I
    consider it.

        * Zero, I always forget that I spent a month writing a
          journaling file data structure. It just hums along quietly
          in the background now. It splits a random access file into
          pages. Reads those pages in and out of memory. It uses
          weak references to implement a page cache. It's pretty
          cool, but its pretty much done, so...
        
        * First, Momento maintains a version axis.
        
          Rather than updating a node, Momento links a new version
          of that node to its version axis. An XSLT transform
          navigates a Momento document with a version number in
          hand. When you get the first child, say, you check to see
          if there is a new version of that child, and iterate, but
          never past the maximum version.

          The older versions are kept around until any queries
          referencing them terminate. At that point, the older
          versions of the nodes can be collected.

          Thus a newer version of the document can be assembled
          while the existing version is queried. That newer version
          can even be discarded and it will be ignored (iterate
          beyond to the next good version, or stick with the last
          good version). Volia: commit and rollback.

          The version-axis allows for however many concurrent
          queries, they will not have to wait for updates.

        * Second, Momento organizes its nodes in clusters.
        
          An application developer tunes their application for
          performance by specifying which nodes will to be clustered
          on the same pages. This ought to be a obvious decision for
          the most part. Consdier a bug database.

          <bug-document xmlns="http://engrm.com/bugs">
            <project name="Momento">
              <issue name="Won't do this">
                <comment />
                <comment />
                .
                .
              </issue>
              <issue name="Doesn't do that">
                <comment />
                <comment />
                .
                .
              </issue>
              .
              .
              .
              .
            </project>
         </project>        

         In the above document, it is likely that most of the data
         manipulation will occur within an issue. If the nodes are
         clustered by issue, then Momento can place an read lock
         at the issue node, allowing other issue nodes to be updated
         concurrently.
         
         There will be aggreagte queries, but most queries
         (transforms) will manipulate an issue. It makes sense then
         to cluster the nodes in an issue on the same pages.

         It makes sense to to cluster the issue nodes themselves
         together since the most likely axis of traversal is the the
         next-sibling axis, used to find a specific issue by name.

         Clusters are akin to files on the file system, really.

       * Third, Momento maintains performance through node proximity
         by reorganizing clusters, and performance of updates by
         roorganinzing tclusters  as a spearate step apart from
         updating them. 

         This is tricky to explain, not that its a complicated
         concept, I just haven't tried to document it yet, bare
         with..

         In its organized state, a cluster contains it's nodes in
         document order. This puts first children on the same page
         a parent and a next sibling on the same page a previous
         sibling, that is, not too far away.

         When a cluster mutates, new nodes are allocated from a
         scrap page of nodes and linked to the version axis. Now the
         document order is wonky. The newer version will cause
         queries to iterate into the scrap pages, proximity starts
         to suffer.

         Therefore, as a second step to updating the document,
         Momento must organize itself by copying the last commited
         version of a cluster to a new set of pages, retaining only
         the latest version of each node.
          
         This can occur in a separate thread, or more likely at the
         prompting of the application developer. If a user is
         pecking away at an interactive form, it may make sense for
         the user to press Okay before going to the trouble to
         organize the cluster.

         In most XML applications, there are going to be natural
         candidates for a Momento cluster. Often this will map
         directly to a file on the file system, in an existing
         application.

    
    My communication skills are getting streched by all the
    announcing. Please let me know if this is a good explaintion.
    I can use it to create a better overview document.

    I'll try not to be so prolix in the follow up. They were broad
    questions. With these concepts, everything else in Momento is
    obvious. It is pretty simple.

I was planning on announcing all week, but I'm traveling instead. I
    do hope to foster discussion of this project. I'll check in at
    the airport. More questions, please.
    
    There is a mailing list too. Please join to participate or observe.

    momento-subscribe@engrm.com

Thanks again for asking.

-- 
Alan / alan@engrm.com / http://engrm.com/
    aim/yim: alanengrm - icq: 228631855 - msn: alanengrm@hotmail.com

Mime
View raw message