poi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Elvira Gurevich" <Elvira_Gurev...@ibi.com>
Subject RE: sheet names and string format read garbled on EBCDIC machine
Date Fri, 04 Apr 2003 15:16:05 GMT
Do you mean the the Excel file itself knows and can communicate what
encoding is used? Is it somewhere in its record structure? Otherwise,
how are you going to know what encoding to use, unless you let the
client application pass it in?

-----Original Message-----
From: Andrew C. Oliver [mailto:acoliver@apache.org] 
Sent: Thursday, April 03, 2003 1:30 PM
To: POI Developers List
Cc: 'POI Users List'
Subject: Re: sheet names and string format read garbled on EBCDIC
machine

Quite possibly.  Good point.  Perhaps you can work with some of the 
Japanese folks on the list in order to create appropriate patches/unit 
tests.

Remember, its not only the right encoding thats at work, but what Excel 
will accept..

Elvira Gurevich wrote:

>Sure, I'll do that.
>But would not it be a problem if the original excel file was created on
>a Japanese version of Windows? With a Japanese worksheet name?
>
>-----Original Message-----
>From: Andrew C. Oliver [mailto:acoliver@apache.org] 
>Sent: Wednesday, April 02, 2003 5:21 PM
>To: POI Users List; POI Developers List
>Subject: Re: sheet names and string format read garbled on EBCDIC
>machine
>
>No, it should universally be ISO-8859-1 (Latin1).
>
>Submit a patch and make the unit tests pass (as well as any relevent 
>unit tests)
>
>-Andy
>
>Elvira Gurevich wrote:
>
>  
>
>>In org.apache.poi.hssf.record.BoundSheetRecord, around line 143,
>>The original code was:
>>
>>       if ( ( field_4_compressed_unicode_flag & 0x01 ) == 1 )
>>       {
>>           field_5_sheetname = StringUtil.getFromUnicodeHigh( data, 8
>>    
>>
>+
>  
>
>>offset, nameLength );
>>       }
>>       else
>>       {
>>		field_5_sheetname = new String( data, 8 + offset,
>>nameLength);
>>       }
>>
>>As you can see, if the flag is not on, the String is constructed using
>>the native machine encoding. I am not familiar with the record
>>    
>>
>structure
>  
>
>>and what versions of EXCEL will have this flag on. In any case, in my
>>scenario, the excel file was created on and ASCII machine, poi was run
>>on an EBCDIC machine, creating a string in native EBCDIC which of
>>    
>>
>course
>  
>
>>resulted in garbage. Giving the String constructor an encoding fixed
>>this particular scenario, as following:
>>
>>
>>       if ( ( field_4_compressed_unicode_flag & 0x01 ) == 1 )
>>       {
>>           field_5_sheetname = StringUtil.getFromUnicodeHigh( data, 8
>>    
>>
>+
>  
>
>>offset, nameLength );
>>       }
>>       else
>>       {
>>	    try
>>	    {
>>		field_5_sheetname = new String( data, 8 + offset,
>>nameLength, "UTF-8");
>>	    }
>>	    catch(java.io.UnsupportedEncodingException e)
>>	    {
>>		throw new RecordFormatException( "Unsupported Encoding
>>UTF-8" );
>>	    }
>>       }
>>
>>If you tell me that UTF-8 is not the right encoding to use, I agree.
Is
>>there a universal encoding to plug into this constructor for this
case?
>>Probably not. The solution to me would be setting an encoding into a
>>workbook through a new HSSFWorkbook(String encoding) constructor (or a
>>setEncoding() method) which would make this field available to a lower
>>level class, like org.apache.poi.hssf.record.BoundSheetRecord.
>>
>>For that matter, org.apache.poi.hssf.record.FormatRecord has the same
>>problem around line 133.
>>
>>
>>Elvira.
>>
>>
>>
>>-----Original Message-----
>>From: Elvira Gurevich [mailto:Elvira_Gurevich@ibi.com] 
>>Sent: Monday, March 24, 2003 10:12 AM
>>To: 'POI Users List'
>>Subject: RE: sheet names and string format read garbled on EBCDIC
>>machine
>>
>>Apparently Excel 2000 uses Unicode internally for all strings. All the
>>cell content strings are read correctly. I converted the sheet name
>>string that came from wb.getSheetName(sheet) call to bytes (as in
>>sheetName.getBytes()) and traced that. The byte code correspond to
>>    
>>
>ASCII
>  
>
>>characters. Which tells me that whoever reads the string, reads it in
>>the default machine encoding, but the string is already in Unicode.  
>>
>>-----Original Message-----
>>From: Joshua Davis [mailto:joshua.davis@kiodex.com] 
>>Sent: Thursday, March 20, 2003 6:40 AM
>>To: 'POI Users List'
>>Subject: RE: sheet names and string format read garbled on EBCDIC
>>machine
>>
>>Elvira,
>>
>>Wulf is right, as this is an odd use case. I'm guessing you are using
>>Java
>>on a mainframe, and hence your need for EBCDIC support.  Maybe you
>>    
>>
>could
>  
>
>>write an EBCDIC->ASCII stream filter and contribute it?
>>
>>BTW, I used to work for IBI... My group had to write an EBCDIC->ASCII
>>filter
>>so that we could make use of some third party libraries.  Mainframes
>>    
>>
>are
>  
>
>>a
>>giant PITA.
>>
>>-----Original Message-----
>>From: Wulf Wechsung [mailto:ww@contexo.de] 
>>Sent: Thursday, March 20, 2003 6:29 AM
>>To: POI Users List
>>Subject: AW: sheet names and string format read garbled on EBCDIC
>>machine
>>
>>
>>
>>what he is trying to say, I think is this: Support is what you are
>>    
>>
>*not*
>  
>
>>paying for, hence it comes down to what little or much people around
>>here
>>are willing and able to provide. C'mmon, you got java developers (or
>>even
>>are one) I am sure, just fix it yourself. PIO should have saved you
>>enough
>>time to do it.
>>
>>-----Ursprungliche Nachricht-----
>>Von: Elvira Gurevich [mailto:Elvira_Gurevich@ibi.com]
>>Gesendet: Mittwoch, 19. Marz 2003 23:25
>>An: 'POI Users List'
>>Betreff: RE: sheet names and string format read garbled on EBCDIC
>>machine
>>
>>
>>This is not a murmur. This is a specific problem and I tried to
provide
>>as
>>much info on it as I could. If you need more info, I would be happy to
>>get
>>it to you. It just seems that nobody even looked at the problem so
far.
>>It
>>could be a simple fix for someone familiar with the code...
>>
>>If you really need a test setup and will use it, please provide me
with
>>more
>>details. I am not a decision maker, but will have to present this to
>>people.
>>
>>Thanks,
>>Elvira.
>>
>>
>>-----Original Message-----
>>From: Andrew C. Oliver [mailto:acoliver@apache.org] 
>>Sent: Wednesday, March 19, 2003 1:06 PM
>>To: POI Users List
>>Subject: Re: sheet names and string format read garbled on EBCDIC
>>machine
>>
>>Elvira_Gurevich@iwaysoftware.com wrote:
>>
>> 
>>
>>    
>>
>>>Hello,
>>>
>>>On  3/7/03, I submitted a bug#17791.
>>>To that bug report, I attached the relevant excel files and
BiffViewer
>>>      
>>>
>
>  
>
>>>traces.
>>>
>>>We re-ran the test with jakarta-poi-1.11.0-dev-20030317.jar, with no
>>>   
>>>
>>>      
>>>
>>changes
>> 
>>
>>    
>>
>>>in results.
>>>
>>>We can provide an EBCDIC system for your testing, if that's the
reason
>>>   
>>>
>>>      
>>>
>>for
>> 
>>
>>    
>>
>>>the problem to have been neglected so far.
>>>
>>>
>>>   
>>>
>>>      
>>>
>>That would help if you're willing to set up an automated process to
run
>>    
>>
>
>  
>
>>our unit tests and send back info to our mailing list.  I've asked for

>>this for awhile and I've only heard murmurs of volunteers and no 
>>takers.  We could really use automated testing on systems other than 
>>Windows and Linux.  I know a number of people who use POI on solaris 
>>without trouble but I've heared murmurs about problems on various 
>>Mainframes and minis.
>>
>> 
>>
>>    
>>
>>>I really need this resolved.
>>>
>>>   
>>>
>>>      
>>>
>>The best way to get things done urgently and for "free" is to provide 
>>patches which resolve the issue.   Since the project is run on a 
>>volunteer basis, things get fixed "when we have time". 
>>
>>Currently, when not treking around the country, I myself have been 
>>working on a number of client-funded POI projects and haven't had time

>>to work on much else. 
>>
>>Thanks,
>>
>>-Andy
>>
>> 
>>
>>    
>>
>>>Thank you.
>>>Elvira Gurevich
>>>iWay Software
>>>
>>>
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: poi-user-help@jakarta.apache.org
>>>
>>>
>>>
>>>
>>>   
>>>
>>>      
>>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: poi-user-help@jakarta.apache.org
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: poi-user-help@jakarta.apache.org
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: poi-user-help@jakarta.apache.org
>>
>>Disclaimer: This e-mail may contain confidential and privileged
>>    
>>
>material
>  
>
>>for
>>the sole use of the intended recipient(s).  If you are not the
intended
>>recipient (or authorized to receive this e-mail for the recipient),
>>please
>>note that review, use, distribution or disclosure of any part of this
>>e-mail
>>is strictly prohibited, except that you should please contact the
>>    
>>
>sender
>  
>
>>or
>>notify Kiodex, Inc. at notices@kiodex.com that you have received this
>>message in error, and delete all copies of the message.  This e-mail
>>    
>>
>and
>  
>
>>any
>>attachments hereto are the property of Kiodex, Inc. and/or its
relevant
>>affiliate, and are not intended to be an offer or an acceptance, and
do
>>not
>>create or evidence a binding and enforceable contract between Kiodex,
>>Inc.
>>or any of its affiliates and the intended recipient or any other
party,
>>and
>>may not be relied on by anyone as the basis of a contract by estoppel
>>    
>>
>or
>  
>
>>otherwise.
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: poi-user-help@jakarta.apache.org
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: poi-user-help@jakarta.apache.org
>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: poi-user-help@jakarta.apache.org
>>
>>
>> 
>>
>>    
>>
>
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: poi-user-help@jakarta.apache.org
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: poi-dev-help@jakarta.apache.org
>
>
>  
>




---------------------------------------------------------------------
To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: poi-dev-help@jakarta.apache.org



Mime
View raw message