poi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew C. Oliver" <acoli...@apache.org>
Subject Re: sheet names and string format read garbled on EBCDIC machine
Date Fri, 04 Apr 2003 15:38:25 GMT
Start here:  
http://jakarta.apache.org/poi/javadocs/javasrc/org/apache/poi/hssf/record/CodepageRecord_java.html#CodepageRecord

Elvira Gurevich wrote:

>Do you mean the the Excel file itself knows and can communicate what
>encoding is used? Is it somewhere in its record structure? Otherwise,
>how are you going to know what encoding to use, unless you let the
>client application pass it in?
>
>-----Original Message-----
>From: Andrew C. Oliver [mailto:acoliver@apache.org] 
>Sent: Thursday, April 03, 2003 1:30 PM
>To: POI Developers List
>Cc: 'POI Users List'
>Subject: Re: sheet names and string format read garbled on EBCDIC
>machine
>
>Quite possibly.  Good point.  Perhaps you can work with some of the 
>Japanese folks on the list in order to create appropriate patches/unit 
>tests.
>
>Remember, its not only the right encoding thats at work, but what Excel 
>will accept..
>
>Elvira Gurevich wrote:
>
>  
>
>>Sure, I'll do that.
>>But would not it be a problem if the original excel file was created on
>>a Japanese version of Windows? With a Japanese worksheet name?
>>
>>-----Original Message-----
>>From: Andrew C. Oliver [mailto:acoliver@apache.org] 
>>Sent: Wednesday, April 02, 2003 5:21 PM
>>To: POI Users List; POI Developers List
>>Subject: Re: sheet names and string format read garbled on EBCDIC
>>machine
>>
>>No, it should universally be ISO-8859-1 (Latin1).
>>
>>Submit a patch and make the unit tests pass (as well as any relevent 
>>unit tests)
>>
>>-Andy
>>
>>Elvira Gurevich wrote:
>>
>> 
>>
>>    
>>
>>>In org.apache.poi.hssf.record.BoundSheetRecord, around line 143,
>>>The original code was:
>>>
>>>      if ( ( field_4_compressed_unicode_flag & 0x01 ) == 1 )
>>>      {
>>>          field_5_sheetname = StringUtil.getFromUnicodeHigh( data, 8
>>>   
>>>
>>>      
>>>
>>+
>> 
>>
>>    
>>
>>>offset, nameLength );
>>>      }
>>>      else
>>>      {
>>>		field_5_sheetname = new String( data, 8 + offset,
>>>nameLength);
>>>      }
>>>
>>>As you can see, if the flag is not on, the String is constructed using
>>>the native machine encoding. I am not familiar with the record
>>>   
>>>
>>>      
>>>
>>structure
>> 
>>
>>    
>>
>>>and what versions of EXCEL will have this flag on. In any case, in my
>>>scenario, the excel file was created on and ASCII machine, poi was run
>>>on an EBCDIC machine, creating a string in native EBCDIC which of
>>>   
>>>
>>>      
>>>
>>course
>> 
>>
>>    
>>
>>>resulted in garbage. Giving the String constructor an encoding fixed
>>>this particular scenario, as following:
>>>
>>>
>>>      if ( ( field_4_compressed_unicode_flag & 0x01 ) == 1 )
>>>      {
>>>          field_5_sheetname = StringUtil.getFromUnicodeHigh( data, 8
>>>   
>>>
>>>      
>>>
>>+
>> 
>>
>>    
>>
>>>offset, nameLength );
>>>      }
>>>      else
>>>      {
>>>	    try
>>>	    {
>>>		field_5_sheetname = new String( data, 8 + offset,
>>>nameLength, "UTF-8");
>>>	    }
>>>	    catch(java.io.UnsupportedEncodingException e)
>>>	    {
>>>		throw new RecordFormatException( "Unsupported Encoding
>>>UTF-8" );
>>>	    }
>>>      }
>>>
>>>If you tell me that UTF-8 is not the right encoding to use, I agree.
>>>      
>>>
>Is
>  
>
>>>there a universal encoding to plug into this constructor for this
>>>      
>>>
>case?
>  
>
>>>Probably not. The solution to me would be setting an encoding into a
>>>workbook through a new HSSFWorkbook(String encoding) constructor (or a
>>>setEncoding() method) which would make this field available to a lower
>>>level class, like org.apache.poi.hssf.record.BoundSheetRecord.
>>>
>>>For that matter, org.apache.poi.hssf.record.FormatRecord has the same
>>>problem around line 133.
>>>
>>>
>>>Elvira.
>>>
>>>
>>>
>>>-----Original Message-----
>>>From: Elvira Gurevich [mailto:Elvira_Gurevich@ibi.com] 
>>>Sent: Monday, March 24, 2003 10:12 AM
>>>To: 'POI Users List'
>>>Subject: RE: sheet names and string format read garbled on EBCDIC
>>>machine
>>>
>>>Apparently Excel 2000 uses Unicode internally for all strings. All the
>>>cell content strings are read correctly. I converted the sheet name
>>>string that came from wb.getSheetName(sheet) call to bytes (as in
>>>sheetName.getBytes()) and traced that. The byte code correspond to
>>>   
>>>
>>>      
>>>
>>ASCII
>> 
>>
>>    
>>
>>>characters. Which tells me that whoever reads the string, reads it in
>>>the default machine encoding, but the string is already in Unicode.  
>>>
>>>-----Original Message-----
>>>From: Joshua Davis [mailto:joshua.davis@kiodex.com] 
>>>Sent: Thursday, March 20, 2003 6:40 AM
>>>To: 'POI Users List'
>>>Subject: RE: sheet names and string format read garbled on EBCDIC
>>>machine
>>>
>>>Elvira,
>>>
>>>Wulf is right, as this is an odd use case. I'm guessing you are using
>>>Java
>>>on a mainframe, and hence your need for EBCDIC support.  Maybe you
>>>   
>>>
>>>      
>>>
>>could
>> 
>>
>>    
>>
>>>write an EBCDIC->ASCII stream filter and contribute it?
>>>
>>>BTW, I used to work for IBI... My group had to write an EBCDIC->ASCII
>>>filter
>>>so that we could make use of some third party libraries.  Mainframes
>>>   
>>>
>>>      
>>>
>>are
>> 
>>
>>    
>>
>>>a
>>>giant PITA.
>>>
>>>-----Original Message-----
>>>From: Wulf Wechsung [mailto:ww@contexo.de] 
>>>Sent: Thursday, March 20, 2003 6:29 AM
>>>To: POI Users List
>>>Subject: AW: sheet names and string format read garbled on EBCDIC
>>>machine
>>>
>>>
>>>
>>>what he is trying to say, I think is this: Support is what you are
>>>   
>>>
>>>      
>>>
>>*not*
>> 
>>
>>    
>>
>>>paying for, hence it comes down to what little or much people around
>>>here
>>>are willing and able to provide. C'mmon, you got java developers (or
>>>even
>>>are one) I am sure, just fix it yourself. PIO should have saved you
>>>enough
>>>time to do it.
>>>
>>>-----Ursprungliche Nachricht-----
>>>Von: Elvira Gurevich [mailto:Elvira_Gurevich@ibi.com]
>>>Gesendet: Mittwoch, 19. Marz 2003 23:25
>>>An: 'POI Users List'
>>>Betreff: RE: sheet names and string format read garbled on EBCDIC
>>>machine
>>>
>>>
>>>This is not a murmur. This is a specific problem and I tried to
>>>      
>>>
>provide
>  
>
>>>as
>>>much info on it as I could. If you need more info, I would be happy to
>>>get
>>>it to you. It just seems that nobody even looked at the problem so
>>>      
>>>
>far.
>  
>
>>>It
>>>could be a simple fix for someone familiar with the code...
>>>
>>>If you really need a test setup and will use it, please provide me
>>>      
>>>
>with
>  
>
>>>more
>>>details. I am not a decision maker, but will have to present this to
>>>people.
>>>
>>>Thanks,
>>>Elvira.
>>>
>>>
>>>-----Original Message-----
>>>From: Andrew C. Oliver [mailto:acoliver@apache.org] 
>>>Sent: Wednesday, March 19, 2003 1:06 PM
>>>To: POI Users List
>>>Subject: Re: sheet names and string format read garbled on EBCDIC
>>>machine
>>>
>>>Elvira_Gurevich@iwaysoftware.com wrote:
>>>
>>>
>>>
>>>   
>>>
>>>      
>>>
>>>>Hello,
>>>>
>>>>On  3/7/03, I submitted a bug#17791.
>>>>To that bug report, I attached the relevant excel files and
>>>>        
>>>>
>BiffViewer
>  
>
>>>>     
>>>>
>>>>        
>>>>
>> 
>>
>>    
>>
>>>>traces.
>>>>
>>>>We re-ran the test with jakarta-poi-1.11.0-dev-20030317.jar, with no
>>>>  
>>>>
>>>>     
>>>>
>>>>        
>>>>
>>>changes
>>>
>>>
>>>   
>>>
>>>      
>>>
>>>>in results.
>>>>
>>>>We can provide an EBCDIC system for your testing, if that's the
>>>>        
>>>>
>reason
>  
>
>>>>  
>>>>
>>>>     
>>>>
>>>>        
>>>>
>>>for
>>>
>>>
>>>   
>>>
>>>      
>>>
>>>>the problem to have been neglected so far.
>>>>
>>>>
>>>>  
>>>>
>>>>     
>>>>
>>>>        
>>>>
>>>That would help if you're willing to set up an automated process to
>>>      
>>>
>run
>  
>
>>>   
>>>
>>>      
>>>
>> 
>>
>>    
>>
>>>our unit tests and send back info to our mailing list.  I've asked for
>>>      
>>>
>
>  
>
>>>this for awhile and I've only heard murmurs of volunteers and no 
>>>takers.  We could really use automated testing on systems other than 
>>>Windows and Linux.  I know a number of people who use POI on solaris 
>>>without trouble but I've heared murmurs about problems on various 
>>>Mainframes and minis.
>>>
>>>
>>>
>>>   
>>>
>>>      
>>>
>>>>I really need this resolved.
>>>>
>>>>  
>>>>
>>>>     
>>>>
>>>>        
>>>>
>>>The best way to get things done urgently and for "free" is to provide 
>>>patches which resolve the issue.   Since the project is run on a 
>>>volunteer basis, things get fixed "when we have time". 
>>>
>>>Currently, when not treking around the country, I myself have been 
>>>working on a number of client-funded POI projects and haven't had time
>>>      
>>>
>
>  
>
>>>to work on much else. 
>>>
>>>Thanks,
>>>
>>>-Andy
>>>
>>>
>>>
>>>   
>>>
>>>      
>>>
>>>>Thank you.
>>>>Elvira Gurevich
>>>>iWay Software
>>>>
>>>>
>>>>
>>>>
>>>>---------------------------------------------------------------------
>>>>To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
>>>>For additional commands, e-mail: poi-user-help@jakarta.apache.org
>>>>
>>>>
>>>>
>>>>
>>>>  
>>>>
>>>>     
>>>>
>>>>        
>>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: poi-user-help@jakarta.apache.org
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: poi-user-help@jakarta.apache.org
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: poi-user-help@jakarta.apache.org
>>>
>>>Disclaimer: This e-mail may contain confidential and privileged
>>>   
>>>
>>>      
>>>
>>material
>> 
>>
>>    
>>
>>>for
>>>the sole use of the intended recipient(s).  If you are not the
>>>      
>>>
>intended
>  
>
>>>recipient (or authorized to receive this e-mail for the recipient),
>>>please
>>>note that review, use, distribution or disclosure of any part of this
>>>e-mail
>>>is strictly prohibited, except that you should please contact the
>>>   
>>>
>>>      
>>>
>>sender
>> 
>>
>>    
>>
>>>or
>>>notify Kiodex, Inc. at notices@kiodex.com that you have received this
>>>message in error, and delete all copies of the message.  This e-mail
>>>   
>>>
>>>      
>>>
>>and
>> 
>>
>>    
>>
>>>any
>>>attachments hereto are the property of Kiodex, Inc. and/or its
>>>      
>>>
>relevant
>  
>
>>>affiliate, and are not intended to be an offer or an acceptance, and
>>>      
>>>
>do
>  
>
>>>not
>>>create or evidence a binding and enforceable contract between Kiodex,
>>>Inc.
>>>or any of its affiliates and the intended recipient or any other
>>>      
>>>
>party,
>  
>
>>>and
>>>may not be relied on by anyone as the basis of a contract by estoppel
>>>   
>>>
>>>      
>>>
>>or
>> 
>>
>>    
>>
>>>otherwise.
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: poi-user-help@jakarta.apache.org
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: poi-user-help@jakarta.apache.org
>>>
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: poi-user-help@jakarta.apache.org
>>>
>>>
>>>
>>>
>>>   
>>>
>>>      
>>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: poi-user-help@jakarta.apache.org
>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: poi-dev-help@jakarta.apache.org
>>
>>
>> 
>>
>>    
>>
>
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: poi-dev-help@jakarta.apache.org
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: poi-dev-help@jakarta.apache.org
>
>
>  
>




Mime
View raw message