poi-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sanjiv Jivan" <sanjiv.ji...@gmail.com>
Subject Re: HSSF - Generating large spreadsheets in streaming manner?
Date Fri, 12 Jan 2007 21:47:47 GMT
That seems to make sense. Thanks.

On 1/12/07, David Fisher <dfisher@jmlafferty.com> wrote:
>
> Accept state and follow Michael's suggestion (which is how we do it
> with our dynamic PPT, XLS and PDF) is to build the file and then
> serve the file's bytes. While the content is creating we show a
> progress with a continue working button, if that is clicked then we
> have a menu bar spot that tracks the progress and when the build
> thread has completed if the progress is still going the download
> proceeds automatically, if they have continued working then the menu
> bar updates to let them the document is ready and they can select it
> for download at their leisure.
>
>
>
>
>
>
>
> Heck according to ComputerWorld this approach defeats the recently
> discovered Adobe Reader web browsing vulnerability issue.
>
> Other technical reasons:
>
> IE tends to think it knows what type of file it is being served by
> looking at the beginning, guessing and then re-requesting the file.
> Do you want to be building that 50MB file twice? Most of us don't.
> This is the standards flogging trick that Microsoft used to beat
> Netscape with their first IE.
>
> Alternatively, if you are doing nothing except serving 50MB of data w/
> o any style attributes then why not just serve up a CSV file? If you
> set the mime-type correctly that will stream out just fine.
>
> Regards,
> Dave
>
> On Jan 12, 2007, at 2:33 PM, Sanjiv Jivan wrote:
>
> > When you go to download a file from the web you see a download
> > dialog with a
> > progress bar. Would you prefer that when you want to download ,say,
> > a 50 MB
> > zip file (which will take more than a second or two) off the
> > internet that
> > it follow the workflow you describe?
> >
> > The fact that the Excel spreadsheet is generated dynamically is an
> > internal
> > detail. Again, we're talking about a download that takes a few
> > minutes and
> > not hours. Why should the end user go about a different workflow to
> > download
> > a file?
> >
> >
> > On 1/12/07, Donahue, Michael <michael.donahue@pearson.com> wrote:
> >>
> >> Sanjiv -
> >>
> >> Strictly speaking as a web developer, this is a very bad approach to
> >> dealing with a task that may take more than a second or two complete.
> >> Typically, a web application should never make the user wait for more
> >> than a few seconds to completely load the next page.  I don't think I
> >> would want to take your approach for a think client either.
> >>
> >> In the situation you described, it would be better to tell the
> >> user that
> >> their request has been accepted and as soon as it is complete they
> >> will
> >> be notified through some other mechanism that they can download or
> >> view
> >> the results.  This could be through a screen/window pop.
> >>
> >> There still might be a few good places that it might make sense to
> >> have
> >> a streaming API, I'd prefer to see effort spent on tasks that have a
> >> broader utilization curve like the recently added comments support.
> >>
> >> Lastly; "THANK YOU!!" to all of the POI Project developers for all of
> >> their efforts to make POI better.
> >>
> >> -----Original Message-----
> >> From: Sanjiv Jivan [mailto:sanjiv.jivan@gmail.com]
> >> Sent: Friday, January 12, 2007 12:41 PM
> >> To: POI Users List; acoliver@apache.org
> >> Subject: Re: HSSF - Generating large spreadsheets in streaming
> >> manner?
> >>
> >> I think that having a streaming API would be very useful and its not
> >> because
> >> of trying to generate a massive non human readable spreadsheet.
> >> You have
> >> to
> >> factor in the time it takes to build that data to be used for the
> >> spreadsheet too.
> >>
> >> Consider a use case where a user is trying to download a spreadsheet
> >> with
> >> 500 - 1000 rows but the logic involved in getting the data for the
> >> spreadsheet takes around a minute. Without a streaming API, when a
> >> user
> >> tries to download such a file they click on the link and basically
> >> the
> >> browser waits for 1 minute and only then pops up a save dialog
> >> since the
> >> contents of the spreadsheet could only be written out to the response
> >> stream
> >> after the entire spreadsheet was generated. Had there been a
> >> streaming
> >> API,
> >> the contents could have been written to the response stream on the
> >> fly
> >> and a
> >> nice download dialog with progress bar would have displayed by the
> >> browser.
> >>
> >>
> >> On 3/10/06, Andrew C. Oliver <acoliver@apache.org> wrote:
> >> >
> >> > not yet.  Demand for the cocoon serializer hasn't been very high
> >> so it
> >> > is mostly deprecated (unless there is some massive uptake of
> >> support
> >> for
> >> > it).
> >> >
> >> > Okay its time for my yearly rant on this subject (not aimed at
> >> you...you
> >> > just reminded me I hadn't done it this year):
> >> >
> >> > I'm always a little curious about this.  XLS is a HORRIBLE format
> >> (which
> >> > is why I started POI, I wanted to do something difficult).  It is a
> >> > HORRIBLY inefficient format and WAS NOT DESIGNED to stream.  Yet
> >> people
> >> > generate massive sheets in it.  My pensiveness is that no human is
> >> > likely to read such a large sheet or be able to do anything
> >> patricularly
> >> > useful with it.  So who are these sheets for?  Often it turns
> >> out they
> >> > are some kind of data transfer, which is frankly BAFFLING.  Why?
> >> > Because I could do the same transfer with like 1/10th of the
> >> storage,
> >> > bandwidth, CPU, etc in a more well-thought out (or at least
> >> lightweight)
> >> > format.  Yet I saw a spreadsheet today that was 100mb.  The
> >> power of
> >> > Excel is that it can style the data and use some formulas.  This is
> >> good
> >> > for what is to me a summary report and not RAW 100m or gigs of
> >> data..
> >> > Of course this comes from someone who knows how to hack the
> >> underlying
> >> > binary structures but barely knows how to run the Excel GUI.   :-)
> >> >
> >> > We now return you to your previously scheduled mail list activity.
> >> >
> >> > -Andy
> >> >
> >> > PS.  I wish the open office GUI wasn't so crappy, sluggish and
> >> > well...cruddy looking and printed nicely.  Their file formats
> >> make so
> >> > much more sense (and with compression they're reasonably efficient)
> >> and
> >> > the brilliance of text is that it works nicely with revision
> >> control
> >> and
> >> > revision control tags.
> >> >
> >> > PPS.  I also wish the open office developers would either learn C
> >> ++,
> >> > convert all of their code to C and/or port open office to a
> >> language
> >> > they know how to write better structured code in.
> >> >
> >> > Brule, Jon wrote:
> >> > > Is it possible to generate a very large spreadsheet (e.g. several
> >> > > thousand rows) in a low-memory, streaming manner? I am looking
> >> for a
> >> > > corollary to the event model used to parse large spreadsheets.
> >> > >
> >> > > If not, I assume that the Cocoon serializer, which I
> >> understand uses
> >> > > HSSF, would not operate in a streaming manner either...
> >> > >
> >> > > Thank you.
> >> > >
> >> > > Regards,
> >> > > Jon
> >> > > _________________
> >> > > Jon R. Brule
> >> > > Paramount Computing Associates
> >> > >
> >> > >
> >> ---------------------------------------------------------------------
> >> > > To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
> >> > > Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
> >> > > The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/
> >> > >
> >> > >
> >> >
> >> >
> >> >
> >> >
> >> ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
> >> > Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
> >> > The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/
> >> >
> >> >
> >>
> >> *********************************************************************
> >> *******
> >> This email may contain confidential material.
> >> If you were not an intended recipient,
> >> Please notify the sender and delete all copies.
> >> We may monitor email to and from our network.
> >>
> >> *********************************************************************
> >> *******
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
> >> Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
> >> The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/
> >>
> >>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
> Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
> The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message