directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ole Ersoy <ole_er...@yahoo.com>
Subject Re: Streaming / Serializing Big Objects
Date Fri, 08 Sep 2006 19:00:44 GMT
Cool - 

OK suppose we had a StateManager.

The StateManager 
has a decode method on it that reads a persistent file
and creates the directory tree.

The StateManager's encode method uses a list of
references to directory tree objects
creating a concatenated String of the string
representation of all these objects, and then writes
the string to a file, once all the concatenation is
done.

Am I getting any warmer?

I read a little about prevayler.  It just serializes
all the java objects that need to be peristed
immidiately as it becomes aware of them, I think, and
then keeps them updated as the objects mutate.  So if
the application crashes, on reboot it will read the
persistant files and be back up.  To make reboot more
efficient, the persistant files can be managed like I
described above with the StateManager on a clean
shutdown, which I think is what you are describing.

The reason I mention this is because as the directory
tree mutates, we would not want to persist the entire
tree per mutation right?  So we would have to either
use relational persistance, or write a single file
just containing the mutation.  

That would mean we are in more of an rsync like mode,
where if the server crashes, we load the original
directory tree file + any mutation files.

If the directory shuts down cleanly we encode all the
directory objects to one file and delete all the
"temporary" mutation files.

Incidentally EMF can be used for any type of
serialization, a concatenated file like the one I just
described, xml, relational persistance, etc.  One of
the benefits of EMF is that if for whatever reason
someone wanted to serialize to XML, implementing a
function to do so would be very straight forward.  If
someone wanted to serialize to a relational source,
that's easy too.

There's also the EMF Technology projects's Object
Constraint Language can be used to query the EMF
model...and  I would think it would be very useful for
creating directory like queries and coding the query
api.

There's an article on the eclipse site just written on
how to use it.

Cheers,
- Ole


--- Emmanuel Lecharny <elecharny@gmail.com> wrote:

> Ole,
> 
> just keep in mind that we are talking of byte[] or
> String, not complex Java
> objects :)
> 
> What we need is a simple mechanism that will allow
> the server to stream thos
> two kind of objects. The main issue, if we stream to
> disk, is to avoid
> zillions of small files to be created. We need a
> storage which will be able
> to store those blobs into a single file, even if
> it's 10 Gb large.
> 
> An other point is that we can't do XML : it's
> overkilling. You will have
> structures like :
> <jpegPhoto name="MyFace.jpg">
>   Ar45tYU...Rt==  (2Mbytes of base64 data)
> </jpegPhoto>
> 
> Don't over(ab)use XML ;)
> 
> (ok, I know : compared to the disk access, it's ate
> least 2 order of
> magnitude faster, but the less CPU we eat, the more
> can be used by other
> threads).
> 
> Any idea is welcome, and ma be we can start a page
> on confluence with those
> ideas. Atm, we are just in a
> 
> Emmanuel.
> 
> On 9/8/06, Ole Ersoy <ole_ersoy@yahoo.com> wrote:
> >
> >
> > 1-Decoder
> > So if the decoded request request object is above
> the
> > configured threshold, then ADS would need to
> persist
> > it per the configured persitance
> mechanism(Prevayler,
> > ...), otherwise we store it in memory.
> >
> > The myfaces upload component looks at it's size
> > threshold and serializes the uploaded file if it's
> > above the specified threshold.  I'm sure it's just
> > uses Java serialization straight up, but the
> component
> > can be hooked up to any integration/persistance
> layer
> > naturally.
> >
> > Suppose the whole directly tree was stored using
> the
> > Eclipse EMF API.
> >
> > The the decoder would map the request object
> directly
> > to a EMF object, and EMF's persistance mechanism
> could
> > be invoked to persist to xml, straight up object
> > serialization, the Service Data Object API could
> be
> > invoked to serialize to databases, etc.  Web
> Services
> > could be invoked, it's a pretty sexy API, with a
> lot
> > of possibilities.
> >
> > When it comes to streaming images, resources, etc.
> I
> > would think the tomcat API's should be really good
> for
> > that....
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > --- Emmanuel Lecharny < elecharny@gmail.com>
> wrote:
> >
> > > Here is what we have to do to stream large
> objects :
> > >
> > > 1- Decoder :
> > > When we read the user request, we decode it from
> > > ASN.1 BER to a byte[] or to
> > > a String, depending of the object Type. But
> > > basically, we get a byte[].
> > > Whatever, we have two concerns :
> > >  A- if the length of this object - which is
> always
> > > known- is above a certain
> > > size (let say 1K), then we must store the object
> > > somwhere else than in
> > > memory. To do so, we must have a storage which
> can
> > > handle Strings, byte[]
> > > and StreamedObject[]. This has an impact on all
> > > messages (we can't just
> > > work on some attributes, we have to be generic).
> So
> > > this is a huge
> > > refactoring, with accessors for those objects,
> and
> > > especially a Stream.read()
> > > accessor.
> > >  B- If we have to store a String (even a big
> one),
> > > we have to convert the
> > > byte[] to a String. If the String is big, then
> we
> > > must find a way to apply
> > > the byte[] -> String UTF8 conversion from a
> stream,
> > > and stream back the
> > > result. Not so easy ...
> > >
> > > 2- Database storage :
> > > Well, we now have decoded a request, and we have
> to
> > > store the value. The
> > > backend is not Stream ready at all. It should be
> > > able to handme a Stream and
> > > stores data without having to allocate a huge
> bunch
> > > of byte[].
> > > Another problem is the other operation : we read
> an
> > > entry from the backend,
> > > and we want a streamed data to remain streamed.
> > > Again, huge modification.
> > >
> > > 3- Encoder :
> > > Now, let suppose that we successfully get some
> data
> > > from the backend, and
> > > let's suppose that those data are streamed. We
> want
> > > to send them back to the
> > > client without having to create a big byte[].
> That
> > > means we must be able to
> > > ask MINA to send chunks of data until we are
> done
> > > with the streamed data.
> > > ATM, what we do is that we write a full PDU -
> result
> > > of the encode() method
> > > - and MINA send it all. Here, the mechanism will
> be
> > > totally different : we
> > > should inform MINA to send some data as soon as
> we
> > > have a block of bytes
> > > ready (if we send 1500 bytes long blocks, then
> we
> > > may have to call MINA many
> > > times for a jpegPhoto.
> > >
> > > I may have forgotten some issues, so please tell
> me
> > > ! Regarding using a
> > > existing piece of code, I have to say : "well,
> why
> > > not ?". Right now, I
> > > think we should think seriously about the point
> I
> > > mentionned, and may be on
> > > a confluence page. Streaming will take at least
> 2
> > > weeks to write... Any
> > > already written piece of code that can help is
> ok :)
> > >
> > > Emmanuel
> > >
> > > On 9/8/06, Ole Ersoy <ole_ersoy@yahoo.com>
> wrote:
> > > >
> > > > I accidentally deleted the original message...
> > > >
> > > > The myfaces file upload component can be
> > > configured to
> > > > serialize objects larger than a specified
> size.
> > > >
> > > > If that sounds useful, I can extract some
> code...
> 
=== message truncated ===


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Mime
View raw message