From dev-return-13481-apmail-directory-dev-archive=directory.apache.org@directory.apache.org Fri Sep 08 17:48:25 2006 Return-Path: Delivered-To: apmail-directory-dev-archive@www.apache.org Received: (qmail 54658 invoked from network); 8 Sep 2006 17:48:24 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 8 Sep 2006 17:48:24 -0000 Received: (qmail 19480 invoked by uid 500); 8 Sep 2006 17:48:22 -0000 Delivered-To: apmail-directory-dev-archive@directory.apache.org Received: (qmail 19447 invoked by uid 500); 8 Sep 2006 17:48:22 -0000 Mailing-List: contact dev-help@directory.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Apache Directory Developers List" Delivered-To: mailing list dev@directory.apache.org Received: (qmail 19436 invoked by uid 99); 8 Sep 2006 17:48:22 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Sep 2006 10:48:22 -0700 X-ASF-Spam-Status: No, hits=0.5 required=10.0 tests=DNS_FROM_RFC_ABUSE,HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of elecharny@gmail.com designates 64.233.162.202 as permitted sender) Received: from [64.233.162.202] (HELO nz-out-0102.google.com) (64.233.162.202) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Sep 2006 10:48:20 -0700 Received: by nz-out-0102.google.com with SMTP id i11so320964nzh for ; Fri, 08 Sep 2006 10:48:00 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:in-reply-to:mime-version:content-type:references; b=Bo9nj1fuFWK1T8px1v0/rt+zIr44YPTByeU1flExPnajg2DjWkH7Ml4PIF08jS/8ygfC9wAWpm4neuF06/0ZCpXySQHOkANnCU7139riiXcjgHp+63I5dC4Zd1od+L5xlNbVEymENg9mxWtMhQSE4BelPVwyFRWgfpYn69k+IqU= Received: by 10.65.237.19 with SMTP id o19mr2185532qbr; Fri, 08 Sep 2006 10:47:59 -0700 (PDT) Received: by 10.65.84.17 with HTTP; Fri, 8 Sep 2006 10:47:59 -0700 (PDT) Message-ID: Date: Fri, 8 Sep 2006 19:47:59 +0200 From: "Emmanuel Lecharny" Reply-To: elecharny@apache.org To: "Apache Directory Developers List" Subject: Re: Streaming / Serializing Big Objects In-Reply-To: <20060908170104.80785.qmail@web60717.mail.yahoo.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_134016_8946372.1157737679783" References: <20060908170104.80785.qmail@web60717.mail.yahoo.com> X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N ------=_Part_134016_8946372.1157737679783 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Ole, just keep in mind that we are talking of byte[] or String, not complex Java objects :) What we need is a simple mechanism that will allow the server to stream tho= s two kind of objects. The main issue, if we stream to disk, is to avoid zillions of small files to be created. We need a storage which will be able to store those blobs into a single file, even if it's 10 Gb large. An other point is that we can't do XML : it's overkilling. You will have structures like : Ar45tYU...Rt=3D=3D (2Mbytes of base64 data) Don't over(ab)use XML ;) (ok, I know : compared to the disk access, it's ate least 2 order of magnitude faster, but the less CPU we eat, the more can be used by other threads). Any idea is welcome, and ma be we can start a page on confluence with those ideas. Atm, we are just in a Emmanuel. On 9/8/06, Ole Ersoy wrote: > > > 1-Decoder > So if the decoded request request object is above the > configured threshold, then ADS would need to persist > it per the configured persitance mechanism(Prevayler, > ...), otherwise we store it in memory. > > The myfaces upload component looks at it's size > threshold and serializes the uploaded file if it's > above the specified threshold. I'm sure it's just > uses Java serialization straight up, but the component > can be hooked up to any integration/persistance layer > naturally. > > Suppose the whole directly tree was stored using the > Eclipse EMF API. > > The the decoder would map the request object directly > to a EMF object, and EMF's persistance mechanism could > be invoked to persist to xml, straight up object > serialization, the Service Data Object API could be > invoked to serialize to databases, etc. Web Services > could be invoked, it's a pretty sexy API, with a lot > of possibilities. > > When it comes to streaming images, resources, etc. I > would think the tomcat API's should be really good for > that.... > > > > > > > > > > > > --- Emmanuel Lecharny < elecharny@gmail.com> wrote: > > > Here is what we have to do to stream large objects : > > > > 1- Decoder : > > When we read the user request, we decode it from > > ASN.1 BER to a byte[] or to > > a String, depending of the object Type. But > > basically, we get a byte[]. > > Whatever, we have two concerns : > > A- if the length of this object - which is always > > known- is above a certain > > size (let say 1K), then we must store the object > > somwhere else than in > > memory. To do so, we must have a storage which can > > handle Strings, byte[] > > and StreamedObject[]. This has an impact on all > > messages (we can't just > > work on some attributes, we have to be generic). So > > this is a huge > > refactoring, with accessors for those objects, and > > especially a Stream.read() > > accessor. > > B- If we have to store a String (even a big one), > > we have to convert the > > byte[] to a String. If the String is big, then we > > must find a way to apply > > the byte[] -> String UTF8 conversion from a stream, > > and stream back the > > result. Not so easy ... > > > > 2- Database storage : > > Well, we now have decoded a request, and we have to > > store the value. The > > backend is not Stream ready at all. It should be > > able to handme a Stream and > > stores data without having to allocate a huge bunch > > of byte[]. > > Another problem is the other operation : we read an > > entry from the backend, > > and we want a streamed data to remain streamed. > > Again, huge modification. > > > > 3- Encoder : > > Now, let suppose that we successfully get some data > > from the backend, and > > let's suppose that those data are streamed. We want > > to send them back to the > > client without having to create a big byte[]. That > > means we must be able to > > ask MINA to send chunks of data until we are done > > with the streamed data. > > ATM, what we do is that we write a full PDU - result > > of the encode() method > > - and MINA send it all. Here, the mechanism will be > > totally different : we > > should inform MINA to send some data as soon as we > > have a block of bytes > > ready (if we send 1500 bytes long blocks, then we > > may have to call MINA many > > times for a jpegPhoto. > > > > I may have forgotten some issues, so please tell me > > ! Regarding using a > > existing piece of code, I have to say : "well, why > > not ?". Right now, I > > think we should think seriously about the point I > > mentionned, and may be on > > a confluence page. Streaming will take at least 2 > > weeks to write... Any > > already written piece of code that can help is ok :) > > > > Emmanuel > > > > On 9/8/06, Ole Ersoy wrote: > > > > > > I accidentally deleted the original message... > > > > > > The myfaces file upload component can be > > configured to > > > serialize objects larger than a specified size. > > > > > > If that sounds useful, I can extract some code... > > > > > > Cheers, > > > - Ole > > > > > > __________________________________________________ > > > Do You Yahoo!? > > > Tired of spam? Yahoo! Mail has the best spam > > protection around > > > http://mail.yahoo.com > > > > > > > > > > > -- > > Cordialement, > > Emmanuel L=E9charny > > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > --=20 Cordialement, Emmanuel L=E9charny ------=_Part_134016_8946372.1157737679783 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Ole,

just keep in mind that we are talking of byte[] or String, not = complex Java objects :)

What we need is a simple mechanism that will= allow the server to stream thos two kind of objects. The main issue, if we= stream to disk, is to avoid zillions of small files to be created. We need= a storage which will be able to store those blobs into a single file, even= if it's 10 Gb large.=20

An other point is that we can't do XML : it's overkilling. You will= have structures like :
<jpegPhoto name=3D"MyFace.jpg"><= br>  Ar45tYU...Rt=3D=3D  (2Mbytes of base64 data)
</jpegPho= to>

Don't over(ab)use XML ;)

(ok, I know : compared to the disk access, = it's ate least 2 order of magnitude faster, but the less CPU we eat, the mo= re can be used by other threads).

Any idea is welcome, and ma be we = can start a page on confluence with those ideas. Atm, we are just in a=20

Emmanuel.

On 9/8/06, Ole Ersoy < ole_ersoy@yahoo.com> wrote:

1-Decoder
So if the decoded request request object is above the
c= onfigured threshold, then ADS would need to persist
it per the configure= d persitance mechanism(Prevayler,
...), otherwise we store it in memory.

The myfaces upload component looks at it's size
threshold and se= rializes the uploaded file if it's
above the specified threshold. &= nbsp;I'm sure it's just
uses Java serialization straight up, but the com= ponent
can be hooked up to any integration/persistance layer
naturally.
=
Suppose the whole directly tree was stored using the
Eclipse EMF API= .

The the decoder would map the request object directly
to a EMF = object, and EMF's persistance mechanism could
be invoked to persist to xml, straight up object
serialization, the = Service Data Object API could be
invoked to serialize to databases, etc.=   Web Services
could be invoked, it's a pretty sexy API, with = a lot
of possibilities.

When it comes to streaming images, resources, etc.= I
would think the tomcat API's should be really good for
that....










--- Emmanuel Lecharny < elecharny@gmail.com> wrote:
> Here is what we have to do to stream large objects :
>
= > 1- Decoder :
> When we read the user request, we decode it from
> ASN.1 BER to a byte[] or to
> a String, depending of the obj= ect Type. But
> basically, we get a byte[].
> Whatever, we have= two concerns :
>  A- if the length of this object - which = is always
> known- is above a certain
> size (let say 1K), then we must = store the object
> somwhere else than in
> memory. To do so, we= must have a storage which can
> handle Strings, byte[]
> and S= treamedObject[]. This has an impact on all
> messages (we can't just
> work on some attributes, we have t= o be generic). So
> this is a huge
> refactoring, with accessor= s for those objects, and
> especially a Stream.read()
> accesso= r.
>  B- If we have to store a String (even a big one),
&g= t; we have to convert the
> byte[] to a String. If the String is big,= then we
> must find a way to apply
> the byte[] -> String U= TF8 conversion from a stream,
> and stream back the
> result. Not so easy ...
>
>= ; 2- Database storage :
> Well, we now have decoded a request, and we= have to
> store the value. The
> backend is not Stream ready a= t all. It should be
> able to handme a Stream and
> stores data without having to = allocate a huge bunch
> of byte[].
> Another problem is the oth= er operation : we read an
> entry from the backend,
> and we wa= nt a streamed data to remain streamed.
> Again, huge modification.
>
> 3- Encoder :
> Now= , let suppose that we successfully get some data
> from the backend, = and
> let's suppose that those data are streamed. We want
> to = send them back to the
> client without having to create a big byte[]. That
> means w= e must be able to
> ask MINA to send chunks of data until we are done=
> with the streamed data.
> ATM, what we do is that we write a= full PDU - result
> of the encode() method
> - and MINA send it all. Here, the m= echanism will be
> totally different : we
> should inform MINA = to send some data as soon as we
> have a block of bytes
> ready= (if we send 1500 bytes long blocks, then we
> may have to call MINA many
> times for a jpegPhoto.
><= br>> I may have forgotten some issues, so please tell me
> ! Regar= ding using a
> existing piece of code, I have to say : "well, wh= y
> not ?". Right now, I
> think we should think seriously = about the point I
> mentionned, and may be on
> a confluence pa= ge. Streaming will take at least 2
> weeks to write... Any
> al= ready written piece of code that can help is ok :)
>
> Emmanuel
>
> On 9/8/06, Ole Ersoy <ole_ersoy@yahoo.com> wrote:
> = >
> > I accidentally deleted the original message...
> >
> > The myfaces file upload component can be
> configured t= o
> > serialize objects larger than a specified size.
> >=
> > If that sounds useful, I can extract some code...
> >= ;
> > Cheers,
> > - Ole
> >
> > ________= __________________________________________
> > Do You Yahoo!?
&= gt; > Tired of spam?  Yahoo! Mail has the best spam
> pr= otection around
> > http://mail.yahoo.com=
> >
>
>
>
> --
> Cordialement,
&= gt; Emmanuel L=E9charny
>


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best sp= am protection around
http://mail.yahoo= .com



--
Cordialement,
Emmanu= el L=E9charny ------=_Part_134016_8946372.1157737679783--