poi-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From MSB <markbrd...@tiscali.co.uk>
Subject Re: Reading Excel files: character encoding / whole number issues
Date Tue, 24 Nov 2009 17:45:40 GMT

Hello Felix

I missed your reply for some reason; not to matter thgough.

Pleased that you managed to get things working but it is very odd that you
are experiencing problems running under Linux. I am fairly confident that we
have other list members running that operating system and I do not think
they experience any issues. Sadly, I have never used Linux and so cannot
comment but I hope we will hear from someone wuth more experience soon.

All the best with the API and if you need any more help simply post to this
list.

Yours

Mark B


Felix Dreher-2 wrote:
> 
> Hi Chris, Hi Mark,
> 
> thank you very much for the quick response!
> 
> MSB wrote:
>> Just to expand a little on Chris's reply, when you are reading data from
>> the
>> sheet, the correct way to use the HSSFDataFormatter class is something
>> like
>> this;
>>
>> HSSFDataFormatter formatter = new HSSFDataFormatter();
>> HSSFWorkbok workbook = new HSSFWorkbook...........
>> HSSFSheet sheet = workbook.getSheet(0);
>> HSSFRow row = sheet.getRow(0);
>> HSSFCell cell = row.getCell(0);
>> System.out.println(formatter.formatCellValue(cell));
>>
>> That way, the HSSFDataFormatter should use the cell style the user
>> applied
>> to the cell when the created the workbook/worksheet and you ought to see
>> what you expect output to the screen.
>>
>> Yours
>>
>> Mark B
>>
>>
>>   
> The HSSFDataFormatter.formatCellValue() function works fine and gives 
> the expected results. However, again on Linux (plus OpenOffice) it 
> didn't work for some reason. So I decided to reduce the number of 
> complicating factors and to simply use a Windows machine for this project.
> 
> 
>> ChrisLott wrote:
>>   
>>> Please don't assume that the view shown on the screen is a close match 
>>> to the data stored internally!  M$FT-Excel stores all numeric data as 
>>> floating-point numbers.  Then it applies suitable formatting and other 
>>> rules when it displays to you.  Working in POI you see the raw internal 
>>> data.  Storing and reading date/time values is a particular joy!
>>>
>>> Re character encoding: POI has cell encoding features, and copes with 
>>> UTF-8 and UTF-16.  Java is perfectly capable of dealing with special 
>>> characters.  You don't mention where on Linux the data is "converted to 
>>> question marks".  Did you use an IDE like Eclipse and view the data 
>>> right there immediately when it's read?
> No, I ran the Java program (jar) on the command-line and printed the 
> output to a text file. To see how it looks like directly in Netbeans, I 
> just tried it out and there it is the same output (questions marks 
> instead of special characters).
> Anyway, as I already said, the easiest workaround for now will be to run 
> the program on a Windows machine... :-)
> 
> Best regards,
> Felix
> 
> 
> 
> 
>>>   Also search Nabble for UTF-8 
>>> and encoding.  I got this hit:
>>>
>>> http://old.nabble.com/how-to-read-unicode-data-from-xls-sheet-td13601249.html#a13604018
>>>
>>> chris...
>>>
>>> p.s. Gruess aus New Jersey!
>>>
>>>
>>> Felix Dreher wrote:
>>>     
>>>> Hi all,
>>>>
>>>> I'm trying to read Excel files (classes HSSFWorkbook, HSSFSheet, 
>>>> HSSFCell etc.). There are two issues that I'm currently struggling
>>>> with:
>>>>
>>>> 1) non-ASCII characters like é or ü are read correctly on Windows, but

>>>> not on Linux (where these characters are all converted to question 
>>>> marks). I tried to use the "Locale.setDefault(myLocale)" method, but 
>>>> with no success.
>>>>
>>>> 2) The tables I would like to read contain lots of whole numbers, which 
>>>> I would like to treat as strings because they are IDs or order numbers 
>>>> (i.e. I would like to simply read all cells in the same way as Excel 
>>>> displays them). However, they get read as decimal numbers, e.g. the 
>>>> number 1234 in an Excel table (formatted as 'General') becomes 1234.0
>>>> in 
>>>> its POI representation.
>>>> I tried two workarounds, but they didn't work:
>>>> a) open the Excel file, format all cells as Text, save.
>>>> b) use the HSSFDataFormatter class to define a 'Decimal format' with 
>>>> actually no decimals (e.g. DecimalFormat df = new DecimalFormat ("#"))
>>>>
>>>> Any help would be greatly appreciated!
>>>>
>>>> Thank you very much,
>>>> Felix
>>>>       
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
>>> For additional commands, e-mail: user-help@poi.apache.org
>>>
>>>
>>>
>>>     
>>
>>   
> 
> 
> -- 
>  ___________________________________________________________
> 
>  Felix Dreher
>  Max Planck Institute for Molecular Genetics
>  Department of Vertebrate Genomics
>  Bioinformatics Group
> 
>  Ihnestr. 73 (mail) | Fabeckstr. 60-62 (visitors)
>  D-14195 Berlin, Germany
>  phone: +49-30-84131745 | mobile: +49-163-7542426
>  ___________________________________________________________
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
> 
> 
> 

-- 
View this message in context: http://old.nabble.com/Reading-Excel-files%3A-character-encoding---whole-number-issues-tp26494243p26499540.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Mime
View raw message