poi-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Keyes <j...@mac.com>
Subject Re: [HSSF] extra white space
Date Mon, 12 May 2003 13:33:55 GMT
The comment for the addToStringTable method reads:

  "Okay, we are doing some major cheating here. Because we can't 
   handle rich text strings properly we end up getting duplicate 
   strings.  To get around this I'm doing two things: 1. Converting 
   rich text to normal text and 2. If there's a duplicate I'm adding 
   a space onto the end.  Sneaky perhaps but it gets the job done 
   until we can handle this a little better."

Now, I think its more important to support normal text well
than to support rich text poorly (I don't know the history
of this though and there may well be a good reason for this).

The workaround I have is to store an index to the real value
as the value for the key:

e.g. if there were three values 'aaa' in the sheet then the
mapping would be:

 key  value
 1    aaa
 2    #1#
 3    #2#

So this means no duplicate values will appear.  The search algorithm
is not the best though as it means each index has to be resolved to
get the actual value.  This involved changes to SSTRecord.getString
(to resolve the indicies) and SSTDeserializer.addToStringTable (to 
create the indicies).

If there is a column for a FLAG, e.g. Paid/Unpaid, where 'p' = paid
and 'u' = unpaid, there will be many duplicates.  If this sheet needs
to be exported as a CSV, then the actual values will differ to the
ones that are output.  It can be argued that trailing whitespace
is irrelevant, but that is only an assumption.

I don't know enough about the design and implementation of POI to be
able to suggest a way to resolve this, but I thought the information
might prove useful to the developers.

Cheers,
-John K

On Mon, 2003-05-12 at 12:13, John Keyes wrote:
> I've found the offending piece of code in SSTDeserializer:
> 
>     static public void addToStringTable( BinaryTree strings, Integer
> integer, UnicodeString string )
>     {
> 
> 	...
>         boolean added = false;
>         while ( added == false )
>         {
>             try
>             {
>                 strings.put( integer, string );
>                 added = true;
>             }
>             catch ( Exception ignore )
>             {
>                 string.setString( string.getStrings() + " " );
>             }
>         }
> 
>     }
> 
> with the exception message reading:
> 
>   "Cannot store a duplicate value ("XXX") in this Map"
> 
> I'll investigate a fix and post it if I resolve it.
> 
> -John K
> 
> 
> On Mon, 2003-05-12 at 10:17, John Keyes wrote:
> > Hi,
> > 
> > This is my first post to this list.  I've only been using POI for
> > a day (the HSSF side of things) and I must say its a very easy
> > piece of kit to use, so congrats to the developers.
> > 
> > I have encountered a problem while writing a toCSV method for an
> > Excel spreadsheet (I'm sure other people have done this and if
> > there is a Free version available I would glady use that rather
> > than making my own dogfood).
> > 
> > Anyway say I have the following columns in my spreadsheet:
> > 
> > ---------
> > |a|word1|
> > |-|-----|
> > |a|word2|
> > |-|-----|
> > |a|word3|
> > ---------
> > 
> > and this code processes it:
> > 
> > for (int i=0; i < numOfSheets; i++) {
> > 
> >   HSSFSheet sheet = wb.getSheetAt(i);
> >   System.out.println( "==" +  wb.getSheetName(i) + "==");
> >             
> >   for (int j = 0; j < sheet.getLastRowNum(); j++) {
> >     HSSFRow row = sheet.getRow(j);
> >                
> >     if(row == null) {
> >       continue;
> >     }
> >                 
> >     for (short k = 0; k < row.getLastCellNum(); k++) {
> >       HSSFCell cell = row.getCell(k);
> > 
> >       if (cell == null) {
> >         System.out.print(",");
> >         continue;
> >       }
> >                     
> >       switch (cell.getCellType()) {
> >         case HSSFCell.CELL_TYPE_NUMERIC :
> >           System.out.print(cell.getNumericCellValue());         
> >            break;
> >         case HSSFCell.CELL_TYPE_STRING :
> >           System.out.print(cell.getStringCellValue());
> >           break;
> >         case HSSFCell.CELL_TYPE_BLANK :
> >           break;
> >         default :
> >           System.err.println("unsuported sell type {" 
> >               + cell.getCellType() 
> >               + "}");
> >           break;
> >         }
> > 
> >         if (k+1 != row.getLastCellNum()) {
> >           System.out.print(",");
> >         }
> >       }
> >     System.out.println();
> >   }
> > }
> > 
> > (date handling left out for bevity).
> > 
> > The output is:
> > 
> > a,word1
> > a ,word2
> > a  ,word3
> > -- 
> > John Keyes <jbjk@mac.com>
> > 
> > I have tested this also with words and not just characters:
> > 
> > ------------
> > |word|word1|
> > |----|-----|
> > |word|word2|
> > |----|-----|
> > |word|word3|
> > ------------
> > 
> > results in:
> > 
> > word,word1
> > word ,word2
> > word  ,word3
> > 
> > and if the same value appears in different columns:
> > 
> > ---------------
> > |word|word1|10|
> > |----|-----|--|
> > |word|word|20|
> > |----|-----|--|
> > |word|word3|30|
> > ---------------
> > 
> > results in:
> > 
> > word,word1,10
> > word ,word  ,20
> > word   ,word3,30
> > 
> > It appears that N-1 spaces are appended to the cell value where N is the
> > number of identical strings in any column.  Has anyone else seen this
> > behaviour before?
> > 
> > I am using POI-0.10.0-dev, on Red Hat 9, JDK 1.4.1_02 (I've also tried
> > on Win2K).
> > 
> > Thanks,
> > -John K
-- 
John Keyes <jbjk@mac.com>


Mime
View raw message