poi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew C. Oliver" <acoli...@apache.org>
Subject Re: sheet names and string format read garbled on EBCDIC machine
Date Wed, 02 Apr 2003 22:20:50 GMT
No, it should universally be ISO-8859-1 (Latin1).

Submit a patch and make the unit tests pass (as well as any relevent 
unit tests)

-Andy

Elvira Gurevich wrote:

>In org.apache.poi.hssf.record.BoundSheetRecord, around line 143,
>The original code was:
>
>        if ( ( field_4_compressed_unicode_flag & 0x01 ) == 1 )
>        {
>            field_5_sheetname = StringUtil.getFromUnicodeHigh( data, 8 +
>offset, nameLength );
>        }
>        else
>        {
>		field_5_sheetname = new String( data, 8 + offset,
>nameLength);
>        }
>
>As you can see, if the flag is not on, the String is constructed using
>the native machine encoding. I am not familiar with the record structure
>and what versions of EXCEL will have this flag on. In any case, in my
>scenario, the excel file was created on and ASCII machine, poi was run
>on an EBCDIC machine, creating a string in native EBCDIC which of course
>resulted in garbage. Giving the String constructor an encoding fixed
>this particular scenario, as following:
>
>
>        if ( ( field_4_compressed_unicode_flag & 0x01 ) == 1 )
>        {
>            field_5_sheetname = StringUtil.getFromUnicodeHigh( data, 8 +
>offset, nameLength );
>        }
>        else
>        {
>	    try
>	    {
>		field_5_sheetname = new String( data, 8 + offset,
>nameLength, "UTF-8");
>	    }
>	    catch(java.io.UnsupportedEncodingException e)
>	    {
>		throw new RecordFormatException( "Unsupported Encoding
>UTF-8" );
>	    }
>        }
>
>If you tell me that UTF-8 is not the right encoding to use, I agree. Is
>there a universal encoding to plug into this constructor for this case?
>Probably not. The solution to me would be setting an encoding into a
>workbook through a new HSSFWorkbook(String encoding) constructor (or a
>setEncoding() method) which would make this field available to a lower
>level class, like org.apache.poi.hssf.record.BoundSheetRecord.
>
>For that matter, org.apache.poi.hssf.record.FormatRecord has the same
>problem around line 133.
>
>
>Elvira.
>
>
>
>-----Original Message-----
>From: Elvira Gurevich [mailto:Elvira_Gurevich@ibi.com] 
>Sent: Monday, March 24, 2003 10:12 AM
>To: 'POI Users List'
>Subject: RE: sheet names and string format read garbled on EBCDIC
>machine
>
>Apparently Excel 2000 uses Unicode internally for all strings. All the
>cell content strings are read correctly. I converted the sheet name
>string that came from wb.getSheetName(sheet) call to bytes (as in
>sheetName.getBytes()) and traced that. The byte code correspond to ASCII
>characters. Which tells me that whoever reads the string, reads it in
>the default machine encoding, but the string is already in Unicode.  
>
>-----Original Message-----
>From: Joshua Davis [mailto:joshua.davis@kiodex.com] 
>Sent: Thursday, March 20, 2003 6:40 AM
>To: 'POI Users List'
>Subject: RE: sheet names and string format read garbled on EBCDIC
>machine
>
>Elvira,
>
>Wulf is right, as this is an odd use case. I'm guessing you are using
>Java
>on a mainframe, and hence your need for EBCDIC support.  Maybe you could
>write an EBCDIC->ASCII stream filter and contribute it?
>
>BTW, I used to work for IBI... My group had to write an EBCDIC->ASCII
>filter
>so that we could make use of some third party libraries.  Mainframes are
>a
>giant PITA.
>
>-----Original Message-----
>From: Wulf Wechsung [mailto:ww@contexo.de] 
>Sent: Thursday, March 20, 2003 6:29 AM
>To: POI Users List
>Subject: AW: sheet names and string format read garbled on EBCDIC
>machine
>
>
>
>what he is trying to say, I think is this: Support is what you are *not*
>paying for, hence it comes down to what little or much people around
>here
>are willing and able to provide. C'mmon, you got java developers (or
>even
>are one) I am sure, just fix it yourself. PIO should have saved you
>enough
>time to do it.
>
>-----Ursprungliche Nachricht-----
>Von: Elvira Gurevich [mailto:Elvira_Gurevich@ibi.com]
>Gesendet: Mittwoch, 19. Marz 2003 23:25
>An: 'POI Users List'
>Betreff: RE: sheet names and string format read garbled on EBCDIC
>machine
>
>
>This is not a murmur. This is a specific problem and I tried to provide
>as
>much info on it as I could. If you need more info, I would be happy to
>get
>it to you. It just seems that nobody even looked at the problem so far.
>It
>could be a simple fix for someone familiar with the code...
>
>If you really need a test setup and will use it, please provide me with
>more
>details. I am not a decision maker, but will have to present this to
>people.
>
>Thanks,
>Elvira.
>
>
>-----Original Message-----
>From: Andrew C. Oliver [mailto:acoliver@apache.org] 
>Sent: Wednesday, March 19, 2003 1:06 PM
>To: POI Users List
>Subject: Re: sheet names and string format read garbled on EBCDIC
>machine
>
>Elvira_Gurevich@iwaysoftware.com wrote:
>
>  
>
>>Hello,
>>
>>On  3/7/03, I submitted a bug#17791.
>>To that bug report, I attached the relevant excel files and BiffViewer 
>>traces.
>>
>>We re-ran the test with jakarta-poi-1.11.0-dev-20030317.jar, with no
>>    
>>
>changes
>  
>
>>in results.
>>
>>We can provide an EBCDIC system for your testing, if that's the reason
>>    
>>
>for
>  
>
>>the problem to have been neglected so far.
>> 
>>
>>    
>>
>That would help if you're willing to set up an automated process to run 
>our unit tests and send back info to our mailing list.  I've asked for 
>this for awhile and I've only heard murmurs of volunteers and no 
>takers.  We could really use automated testing on systems other than 
>Windows and Linux.  I know a number of people who use POI on solaris 
>without trouble but I've heared murmurs about problems on various 
>Mainframes and minis.
>
>  
>
>>I really need this resolved.
>>
>>    
>>
>The best way to get things done urgently and for "free" is to provide 
>patches which resolve the issue.   Since the project is run on a 
>volunteer basis, things get fixed "when we have time". 
>
>Currently, when not treking around the country, I myself have been 
>working on a number of client-funded POI projects and haven't had time 
>to work on much else. 
>
>Thanks,
>
>-Andy
>
>  
>
>>Thank you.
>>Elvira Gurevich
>>iWay Software
>>
>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: poi-user-help@jakarta.apache.org
>>
>>
>> 
>>
>>    
>>
>
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: poi-user-help@jakarta.apache.org
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: poi-user-help@jakarta.apache.org
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: poi-user-help@jakarta.apache.org
> 
>Disclaimer: This e-mail may contain confidential and privileged material
>for
>the sole use of the intended recipient(s).  If you are not the intended
>recipient (or authorized to receive this e-mail for the recipient),
>please
>note that review, use, distribution or disclosure of any part of this
>e-mail
>is strictly prohibited, except that you should please contact the sender
>or
>notify Kiodex, Inc. at notices@kiodex.com that you have received this
>message in error, and delete all copies of the message.  This e-mail and
>any
>attachments hereto are the property of Kiodex, Inc. and/or its relevant
>affiliate, and are not intended to be an offer or an acceptance, and do
>not
>create or evidence a binding and enforceable contract between Kiodex,
>Inc.
>or any of its affiliates and the intended recipient or any other party,
>and
>may not be relied on by anyone as the basis of a contract by estoppel or
>otherwise.
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: poi-user-help@jakarta.apache.org
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: poi-user-help@jakarta.apache.org
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: poi-user-help@jakarta.apache.org
>
>
>  
>




Mime
View raw message