poi-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Elvira Gurevich" <Elvira_Gurev...@ibi.com>
Subject RE: sheet names and string format read garbled on EBCDIC machine
Date Wed, 02 Apr 2003 22:09:42 GMT
In org.apache.poi.hssf.record.BoundSheetRecord, around line 143,
The original code was:

        if ( ( field_4_compressed_unicode_flag & 0x01 ) == 1 )
        {
            field_5_sheetname = StringUtil.getFromUnicodeHigh( data, 8 +
offset, nameLength );
        }
        else
        {
		field_5_sheetname = new String( data, 8 + offset,
nameLength);
        }

As you can see, if the flag is not on, the String is constructed using
the native machine encoding. I am not familiar with the record structure
and what versions of EXCEL will have this flag on. In any case, in my
scenario, the excel file was created on and ASCII machine, poi was run
on an EBCDIC machine, creating a string in native EBCDIC which of course
resulted in garbage. Giving the String constructor an encoding fixed
this particular scenario, as following:


        if ( ( field_4_compressed_unicode_flag & 0x01 ) == 1 )
        {
            field_5_sheetname = StringUtil.getFromUnicodeHigh( data, 8 +
offset, nameLength );
        }
        else
        {
	    try
	    {
		field_5_sheetname = new String( data, 8 + offset,
nameLength, "UTF-8");
	    }
	    catch(java.io.UnsupportedEncodingException e)
	    {
		throw new RecordFormatException( "Unsupported Encoding
UTF-8" );
	    }
        }

If you tell me that UTF-8 is not the right encoding to use, I agree. Is
there a universal encoding to plug into this constructor for this case?
Probably not. The solution to me would be setting an encoding into a
workbook through a new HSSFWorkbook(String encoding) constructor (or a
setEncoding() method) which would make this field available to a lower
level class, like org.apache.poi.hssf.record.BoundSheetRecord.

For that matter, org.apache.poi.hssf.record.FormatRecord has the same
problem around line 133.


Elvira.



-----Original Message-----
From: Elvira Gurevich [mailto:Elvira_Gurevich@ibi.com] 
Sent: Monday, March 24, 2003 10:12 AM
To: 'POI Users List'
Subject: RE: sheet names and string format read garbled on EBCDIC
machine

Apparently Excel 2000 uses Unicode internally for all strings. All the
cell content strings are read correctly. I converted the sheet name
string that came from wb.getSheetName(sheet) call to bytes (as in
sheetName.getBytes()) and traced that. The byte code correspond to ASCII
characters. Which tells me that whoever reads the string, reads it in
the default machine encoding, but the string is already in Unicode.  

-----Original Message-----
From: Joshua Davis [mailto:joshua.davis@kiodex.com] 
Sent: Thursday, March 20, 2003 6:40 AM
To: 'POI Users List'
Subject: RE: sheet names and string format read garbled on EBCDIC
machine

Elvira,

Wulf is right, as this is an odd use case. I'm guessing you are using
Java
on a mainframe, and hence your need for EBCDIC support.  Maybe you could
write an EBCDIC->ASCII stream filter and contribute it?

BTW, I used to work for IBI... My group had to write an EBCDIC->ASCII
filter
so that we could make use of some third party libraries.  Mainframes are
a
giant PITA.

-----Original Message-----
From: Wulf Wechsung [mailto:ww@contexo.de] 
Sent: Thursday, March 20, 2003 6:29 AM
To: POI Users List
Subject: AW: sheet names and string format read garbled on EBCDIC
machine



what he is trying to say, I think is this: Support is what you are *not*
paying for, hence it comes down to what little or much people around
here
are willing and able to provide. C'mmon, you got java developers (or
even
are one) I am sure, just fix it yourself. PIO should have saved you
enough
time to do it.

-----Ursprungliche Nachricht-----
Von: Elvira Gurevich [mailto:Elvira_Gurevich@ibi.com]
Gesendet: Mittwoch, 19. Marz 2003 23:25
An: 'POI Users List'
Betreff: RE: sheet names and string format read garbled on EBCDIC
machine


This is not a murmur. This is a specific problem and I tried to provide
as
much info on it as I could. If you need more info, I would be happy to
get
it to you. It just seems that nobody even looked at the problem so far.
It
could be a simple fix for someone familiar with the code...

If you really need a test setup and will use it, please provide me with
more
details. I am not a decision maker, but will have to present this to
people.

Thanks,
Elvira.


-----Original Message-----
From: Andrew C. Oliver [mailto:acoliver@apache.org] 
Sent: Wednesday, March 19, 2003 1:06 PM
To: POI Users List
Subject: Re: sheet names and string format read garbled on EBCDIC
machine

Elvira_Gurevich@iwaysoftware.com wrote:

>Hello,
>
>On  3/7/03, I submitted a bug#17791.
>To that bug report, I attached the relevant excel files and BiffViewer 
>traces.
>
>We re-ran the test with jakarta-poi-1.11.0-dev-20030317.jar, with no
changes
>in results.
>
>We can provide an EBCDIC system for your testing, if that's the reason
for
>the problem to have been neglected so far.
>  
>
That would help if you're willing to set up an automated process to run 
our unit tests and send back info to our mailing list.  I've asked for 
this for awhile and I've only heard murmurs of volunteers and no 
takers.  We could really use automated testing on systems other than 
Windows and Linux.  I know a number of people who use POI on solaris 
without trouble but I've heared murmurs about problems on various 
Mainframes and minis.

>I really need this resolved.
>
The best way to get things done urgently and for "free" is to provide 
patches which resolve the issue.   Since the project is run on a 
volunteer basis, things get fixed "when we have time". 

Currently, when not treking around the country, I myself have been 
working on a number of client-funded POI projects and haven't had time 
to work on much else. 

Thanks,

-Andy

>Thank you.
>Elvira Gurevich
>iWay Software
>
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: poi-user-help@jakarta.apache.org
>
>
>  
>




---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: poi-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: poi-user-help@jakarta.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: poi-user-help@jakarta.apache.org
 
Disclaimer: This e-mail may contain confidential and privileged material
for
the sole use of the intended recipient(s).  If you are not the intended
recipient (or authorized to receive this e-mail for the recipient),
please
note that review, use, distribution or disclosure of any part of this
e-mail
is strictly prohibited, except that you should please contact the sender
or
notify Kiodex, Inc. at notices@kiodex.com that you have received this
message in error, and delete all copies of the message.  This e-mail and
any
attachments hereto are the property of Kiodex, Inc. and/or its relevant
affiliate, and are not intended to be an offer or an acceptance, and do
not
create or evidence a binding and enforceable contract between Kiodex,
Inc.
or any of its affiliates and the intended recipient or any other party,
and
may not be relied on by anyone as the basis of a contract by estoppel or
otherwise.

---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: poi-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: poi-user-help@jakarta.apache.org



Mime
View raw message