poi-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From MSB <markbrd...@tiscali.co.uk>
Subject Re: Sorting by column(s)
Date Tue, 19 Jan 2010 16:23:26 GMT

Hello again.

Am back in from work now and with cup of tea in hand so feeling far more
'human'. When I said that I could not really comment re performance of the
two solutions, I hope I did not sound too sniffy; that was not my intention
and I am sorry if my reply came across that way. All I wanted to do was say
that I could not offer any informed opinion either way.

Fully understand your point about non Windows environments and it sort of
negates the suggestion I was going to make which was automating the process
using macros and Excel but that can be put aside for now.

Whilst I was out and about today, I thought did occur to me. Are you
targetting the older binary file (.xls) or the newer xml (.xlsx) based
formats? If the latter, then it may be worth looking at Yegor's BigGridDemo
http://svn.apache.org/repos/asf/poi/trunk/src/examples/src/org/apache/poi/xssf/usermodel/examples/BigGridDemo.java.
The intention was to demonstarte how to create very large workbooks without
running into memory issues and, as I understand it, the process is to stream
data into a temporary file and then use this to populate a workbook. It may
be that you could adapt this approach but I must emphasise that it will only
work for the newer xml based files as it relies upon the XSSF stream of the
API.

Glad you decided to open a new thread for the xls2csv as I know nothing
about it and it would be too easy for other list members to miss if it were
tacked onto this discussion.

Yours

Mark B


f.dcg wrote:
> 
>>
>> I can only hazard a guess at which method would be 'better' (whatever
>> that
>> means) but I do lean more towards the first option as it would require
>> the
>> creation of fewer in memory objectsm reuse the existing worksheet and
>> would
>> be cleaner - I can imagine the carnage created by a few faulty links.
>>
> Point well taken ..
> 
> 
>> Now, having said that, you should be fine writing new values into the
>> existing cells following the data sort, and it will not be necessary to
>> clear out the existing values in my opinion, simply call the
>> setCellValue()
>> method and pass the 'new' value. The only caveat is that the data type
>> should be the same - i.e. write numbers into numeric cells etc - so that
>> you
>> will not have the change the type of the existing worksheet cell. This
>> could
>> be the one problem you will face as once the data has been written to the
>> csv file, you will lose all type information and everything will become a
>> String; it should be easy to overcome this however by 'remembering' the
>> type
>> of the data in each column so that you can convert back before populating
>> the worksheet cells. Of course, the other advantage of this technique is
>> that you will retain any and all style information originally applied to
>> the
>> cells.
>>
> Was already thinking about that as well. The fomatting bit is a nice
> side-benefit but not crucial (yet ..)
> 
> 
>> It seems from your last post as if you are simply using POI to strip the
>> data from an existing Workbook, sort the data and then re-populate that
>> same
>> workbook. I am forced to ask why? Would it not be far easier to simply
>> use
>>
> ;-)
> Because i do not fancy doing all this in a non-windows environment on a
> daily basis with more than 1 workbook. In addition the wb needs to undergo
>>3 sorting rounds and that will probably increase ..
> Too much going on to do it manually .. ;-)
> 
> 
>> Excel to open the file, perform the sort and then save the file back to
>> disk
>> again? Of course, this assumes that you have access to Excel, that the
>> file
>> does already exist and you are not creating it from scratch using POI.
>>
> The latter is also a very real possibility so i would rather set it all up
> with an eye to the future.
> 
> I do have another question re. the usage of the xls2csv code but i'll put
> that in a separate thread.
> 
> Tia,
> 
> Fermin DCG
> 
> 
>> > Thx for the extensive reply.
>> >
>> > I have thought about using a collection but the effect on memory can be
>> > quite detrimental ..
>> > Esp since one never knows how big the files will be.
>> >
>> > So i had already started work on an alternate solution:
>> > - Export file to csv format
>> > - Do a *nix sort on it (fastest there is, afaik)
>> > - clear the workbook-sheet (or would overwriting the old entries be
>> > cheaper?)
>> > - Repopulate by means of file import and write the workbook back to
>> disk
>> > (or
>> > just use it outright)
>> >
>> > OR (and this just popped into my head ;-) ) would the following be more
>> > efficient and faster:
>> > - Add a column to the sheet to be sorted that makes each row uniquely
>> > identifiable
>> > - Push the columns to be sorted on into a collection (incl the unique
>> id
>> > column)
>> > - Sort the collection
>> > - Use the collection as a 'view' for the sheet or create a new sheet
>> and
>> > fill it with links to the original rows (assuming this works?)
>> > * If the collection becomes to large one could do the csv export/
>> import
>> > thing as well (just with the required cols however.)
>> >
>> > I know the 1st solution will work, even with 50G files. But would the
>> > second
>> > solution be feasible as well, referring to the row links.
>> >
>> > Tia,
>> >
>> > Fermin DCG
>> >
>> >
>> > On Mon, Jan 18, 2010 at 5:40 PM, MSB <markbrdsly@tiscali.co.uk> wrote:
>> >
>> >>
>> >> As far as I am aware, it is not possible to sort the data in columns
>> >> using
>> >> POI.
>> >>
>> >> Have you considered using Java's support for collections to help you
>> >> overcome this problem - the Arrays.sort() method may be of some help
>> and
>> >> if
>> >> not, it is always possible to create your own sorting algorithms. Then
>> >> all
>> >> you would need to do is write the data into one of the collections
>> >> classes,
>> >> sort that and then retrieve the data to populate the worksheet.
>> Thinking
>> >> about this further, the JTable class may be the way to go; I know that
>> it
>> >> is
>> >> possible to sort this on columns and I can see that it should be
>> possible
>> >> to
>> >> create a JTable, populate it with data, sort the model and then
>> retrieve
>> >> the
>> >> sorted data to populate the worksheet. Sadly, it has been a long time
>> >> since
>> >> I used Java to create GUIs and so I cannot remember all of the details
>> >> but
>> >> I
>> >> am confident it is possible, here is a good place to start
>> >> http://www.java2s.com/Tutorial/Java/0240__Swing/1100__JTable-Sort.htm.
>> >> You
>> >> may not even need to use the complex JTable object but can simply use
>> the
>> >> model part of it's MCV design pattern, the
>> >> javax.swing.table.DefaultTableModel class.
>> >>
>> >> Yours
>> >>
>> >> Mark B
>> >>
>> >>
>> >> dcg wrote:
>> >> >
>> >> > Hi,
>> >> >
>> >> > How does one sort by one or more columns in POI?
>> >> >
>> >> > Undoubtfully this is a very basic (and simple?) question but for
>> some
>> >> > reason
>> >> > i can't find any mention of it anywhere, nor am i able to locate a
>> >> method
>> >> > in
>> >> > the API the suggests column-sorting.
>> >> > I sincerely hope its due to the fact that i'm overlooking something
>> >> > bacause
>> >> > sorting a sheet seems quite basic.
>> >> >
>> >> > Any suggestions/ links as to how to sort columns would be much
>> >> > appreciated.
>> >> >
>> >> > Tia,
>> >> >
>> >> > Fermin DCG
>> >> >
>> >> >
>> >>
>> >> --
>> >> View this message in context:
>> >>
>> http://old.nabble.com/Sorting-by-column%28s%29-tp27209292p27213160.html
>> >> Sent from the POI - User mailing list archive at Nabble.com.
>> >>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
>> >> For additional commands, e-mail: user-help@poi.apache.org
>> >>
>> >>
>> >
>> >
>> > --
>> > “The reasonable man adapts himself to the world; the unreasonable one
>> > persists in trying to adapt the world to himself. Therefore all
>> progress
>> > depends on the unreasonable man.”
>> > - George Bernard Shaw (1856 - 1950)
>> >
>> >
>>
>> --
>> View this message in context:
>> http://old.nabble.com/Sorting-by-column%28s%29-tp27209292p27222034.html
>> Sent from the POI - User mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
>> For additional commands, e-mail: user-help@poi.apache.org
>>
>>
> 
> 
> -- 
> “The reasonable man adapts himself to the world; the unreasonable one
> persists in trying to adapt the world to himself. Therefore all progress
> depends on the unreasonable man.”
> - George Bernard Shaw (1856 - 1950)
> 
> 

-- 
View this message in context: http://old.nabble.com/Sorting-by-column%28s%29-tp27209292p27228430.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Mime
View raw message