incubator-ooo-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas S├Ąger <>
Subject Re: [DISCUSS][CALC] String content to numeric value
Date Sat, 28 Jul 2012 12:13:50 GMT
Am 27.07.2012 08:20, Dennis E. Hamilton wrote:
> There are several matters here.
> First, the rules for OpenFormula apply to literals as argument and to references to cells
where the value is of type text and the expected type is Number.

The current rules for implicit conversions used to be clear and 
consistent in OOo 2. The current compromise which converts integers and 
ISO dates is still reasonable. Excel gives different results depending 
on the language version.

> It does not apply to how *entered* values are converted.  In fact, there is no guidance
at all about how *entered* values are converted.  Likewise, how cell values are formatted
for the spreadsheet's presentation, for export in CSV, etc., are independent of these rules
in OpenFormula.
> It strikes me that entry of values through an user interface is the time to deal with
these issues and that may resolve most cases.
> For use of literals in formulas, there are ways for users to ensure that the desired
interpretation is made by explicit conversions.  The same conditions apply when a cell value
is text and it is intended to be used as a number.  Safe practice could be encouraged even
if a given product provides the intended result, since there is no assurance of consistent
treatment in an interchange situation.
> (Multi-cell operandsare more complicated, when some cell entries in a range have values
of different type.)
> Without too much thought, I image three things that can be done, along with user guidance:
>   1. The rules for conversion of entered values based on the specified cell format need
to be clear, and that might also include understanding the cell display formatting in properly
handling the cell input conversion.  Since the storage of Number types is canonical and not
subject to locale considerations, this can isolate much of the problem.  The same applies
to export into non-OpenFormula forms, such as CSV.

The rules for German and English locale are unclear ("intuitive") but 
easy, reasonable and flexible enough to get adjusted to.

Things become weird in multi-lingual sheets because the input rules 
differ from cell to cell. IMHO, the interpretation of input strings 
should always follow the globally set locale regardless of the current 
cell formatting. Formula literals behave that way, constant cells do not.

>   2. Users need to be cautioned about what happens when cells and literals are of different
types than the expected values.  It might also be useful for there to be warnings when text
for Number expected arises.  There can also be user guidance on how to detect and prevent
pathological cases.

Users want this or that instantly without thinking nor reading. Next 
time they want something completely different.
Most serious issue is that new spreadsheet users don't understand the 
difference between (literal) string and (formatted) number which are the 
only 2 types in Calc. This accounts for a huge part of all support 
questions since decades. There is no way to do it right other than 
insisting on clear technical distinction.
The documentation is unclear. There must not be any use of literal 
strings nor mention of "text formatting".
The excelish user interface is extremely misleading because a switch 
that turns off input interpretation without applying any number format 
at all is hidden in the number format settings. Number format "@" is not 
a number format at all nor does it convert anything between number and 
text (which is what even advanced users believe).

> I recommend making the default behavior consistent with widespread practice, including
consideration of the importance of Excel interoperability.  If there seems to be no simple
way out of it, I think Error values are the appropriate response, so that an user is given
the opportunity to provide an explicit resolution rather than contend with silent discrepancies
in their own use or when their spreadsheet is interchanged.

As far as I know, Calc's input methods are the same as in Excel.
The LibreOffice folks are about to shoot themselves into the foot. In 
version 3.6 with German locale I can not enter dates  without leaving 
the num-pad while needing at least 2 more key strokes for recent dates. 
A date needs to be entered with 2 or 3 numbers and 2 distinct 
separators. Apart from the extra key strokes, this means that German 
users can not type dates on the num-pad anymore.
The intention is to calm down the users who enter/paste/import sport 
results such as 12/9 or 12-9. For some reason they refuse to 
enter/import/paste text even when you explain them how to do it. They 
want the software to decide what a text is.

>   3. There are probably additional functions that could be introduced into OpenFormula,
even as implementation-specified extensions (in accordance with provisions for that), to provide
functions that will pass Numbers intact and will apply some formatting rule or function when
text is found instead.  (The conversion of cells of explicit date and time types is already
straightforward although it might need review.)  A few new functions may be sufficient to
achieve complete, straightforward control on behalf of users, including forcing of an Error
result.  The extensions could be agreed among the different implementations of OpenFormula,
including in Apache OpenOffice, LibreOffice, Gnumeric, Excel 2013, Google Docs, etc.  (But
first, it should be verified whether there are means enough already.)

IMHO, it is a major problem that virtually nobody writes any add-ins.
The SUMIFS COUNTIFS of OOXML are trivial and they offer nothing new. 
These should be introduced for compatibility.

The current evaluation of string input by keyboard, clipboard or csv is 
Let me summarize it from a user's point of view:

1) By default, everything is evaluated as number if possible.
The = prefix triggers formula evaluation.
The ' prefix supresses evaluation all together.

2) We enter decimals with point or comma depending on the global locale 
which may be overridden by the number format locale.
23/ always enters the 23th day of the month which is very much 
appreciated by num-pad typers, but some people hate it. Please ignore 
the haters.
23/4 enters this year's 23th of April except for US locale where this 
has to be entered as 4/23.
Locale specific date separators may be used instead of the slash.
The colon is used as time separator except for the Italian locales which 
use a point.
ISO dates work with any locale setting.
The words TRUE and FALSE are recognized in the respective locale context 
as synonyms for 1 and 0. This is very different from Excel where boolean 
is a third data type apart from number and string.
Excel: =1=TRUE ==> FALSE
Calc: =1=TRUE ==> TRUE
Input 1.234 may give different values on the same sheet depending on the 
individual cell's number format locale whereas the formula constant 
=1.234 follows the global locale.
IMHO, the evaluation should be consistent throughout the office suite 
depending on the global locale only.
Input 1.234 should always yield the same value within the same locale 
context even if an explicitly set number format locale may display 1,234.
Interestingly, I have never read any complaint about booleans beeing 
different from Excel.

3) The general number format shows decimal numbers without leading nor 
trailing zeros and a configurable maximum of decimals behind the 
point/comma. The general number format automagically applies one 
particular number format when the input string indicates boolean, date, 
time, date+time, currency or scientific number format.
This is wanted behaviour for the general number format. IMHO, this works 
perfectly well for English and German numerals.
Some people expect crude ways of auto-formatting where the number gets 
interpreted and formatted as entered. This would make the spreadsheet 
effectively unusable.

4) Some number format categories change the evaluation method:
4a) Fraction: 3/4 yields 0.75 instead of 3rd April.
4b) Percent: A trailing % is added to all keyboard input effectively 
dividing the entered numeral by 100.
4c) Text (code @), which is no number format at all: It is more like a 
switch to turn off all evaluation for new input analog to the ' prefix. 
Newly entered numbers and prefixes are not evaluated. Any existing 
number or numeric formula result will be shown in the general number 
format. Even advanced users tend to believe that this number format 
converts to text and they expect that the existing string will be 
evaluated when removing the text attribute.

I think, all this is adequate, well established, very well thought, as 
easy as possible and as complicated as necessary given that each sheet 
cell is supposed to take any value of any type and sub-type for any locale.

As already mentioned, the input method should be the same for all office 
numerals depending on the global locale only.
That is issue

The default date of the general number format should have a 4-digit year 
to prevent certain input mistakes.

If AOO implements something analog to the latest LibreOffice date 
evaluation, then it should include a generous evaluation for format 
category "Date". Keep the old behaviour (day number with slash) when the 
cell is explicitly formatted as date.

On top of all number format options the number format dialog (which is 
the same in Calc, Writer and Base) should have a check box instead of 
number format category "Text". When you turn on this option, all the 
other options should be disabled (which is exactly what number format 
"@" actually does). The option should be labeled "New input as literal 
text" or something like that. Alternatively, it could be an evaluation 
switch which enables number formatting when evaluation is on.
The second most important option on that dialog is the locale option. 
All number formatting depends on the locale which should appear On top 
of the options besides the evaluation switch.

CSV export has an option "export as shown" which is what most people 
want. Usually we want to turn it off in order to get the maximum of 
decimal digits but this also removes all other formatting attributes. 
IMHO 0/1 booleans and decimal fractions are perfectly acceptable but 
dates and times should be exported as ISO strings rather than decimals 
based on 1899-12-30 00:00 which makes them unusable for any other 

I try to explain to confused users that csv is a database exchange 
format in plain text and that Calc is neither database nor text editor. 
Calc has to evaluate strings in order to do what it is supposed to do.
If HSQLDB accepted more than point decimals and ISO dates we could have 
a very powerful csv editing tool on board. HSQLDB can link database 
fields to csv which preserves the decimals, dates, delimiters and 
encoding regardless of the displayed format in Base.

That's it for now.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message