commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sebb <seb...@gmail.com>
Subject Re: [csv] CSVFormat API names
Date Tue, 16 Oct 2012 15:43:00 GMT
On 16 October 2012 16:34, Gary Gregory <garydgregory@gmail.com> wrote:
> On Tue, Oct 16, 2012 at 11:29 AM, Gary Gregory <garydgregory@gmail.com>wrote:
>
>> On Tue, Oct 16, 2012 at 11:04 AM, Matt Benson <gudnabrsam@gmail.com>wrote:
>>
>>> Random thoughts--no real context here, so no way to inline:
>>>
>>> - "line separator" concept, while harmonizing with the line.separator
>>> system property, might be better represented as "row separator" so as
>>> not to imply that the parameter should be in any way limited to \r or
>>> \n .  I would think the default for this would be the line.separator
>>> property, however, and thus should take a String or CharSequence
>>> (perhaps it already does, but there's been so much talk about char
>>> parameters...).
>>>
>>
>> Now that you mention it, this should have been obvious as soon as we wrote
>> the test cases where a record is split over more than one line.
>>
>> There is a difference between line number and record number which the API
>> tracks.
>>
>> I propose to change "line separator" to "record separator". The default
>> can be line.separator.
>>
>> Gary
>>
>>
>>>
>>> - with* methods:  just something to think about here, but while we're
>>> creating a fluent API, would e.g. #delimitedBy('\t') read more
>>> fluently than #withDelimiter('\t') ?  #escapingWith('\\') vs.
>>> #withEscape('\\') ?
>>>
>>
> I find that the combination of the fluent API style AND immutability of the
> format class ugly because of the PRISTINE & DISABLED internal crud.
>
> Why not just have DEFAULT and dump PRISTINE? Other formats should be based
> on DEFAULT.

No, because DEFAULT includes several settings that may not be required.

> With PRISTINE, the door is open for a future format to not override
> DISABLED and create a bug, as unlikely as it is.

With DEFAULT, the door is *already* open for bugs due to failure to
reset the unwanted settings.

It's not possible currently to create an instance without overriding
the DISABLED delimiter.

> Gary
>
>
>
>
>>> $0.02,
>>> Matt
>>>
>>> On Tue, Oct 16, 2012 at 8:53 AM, Jörg Schaible
>>> <Joerg.Schaible@scalaris.com> wrote:
>>> > Gary Gregory wrote:
>>> >
>>> >> On Tue, Oct 16, 2012 at 9:14 AM, Jörg Schaible
>>> >> <Joerg.Schaible@scalaris.com>wrote:
>>> >>
>>> >>> Hi Gary,
>>> >>>
>>> >>> Gary Gregory wrote:
>>> >>>
>>> >>> > Hi All:
>>> >>> >
>>> >>> > The format object can configure various aspects of input and
output
>>> >>> > formatting.
>>> >>> >
>>> >>> > With my recent addition of the Quote enum for [CSV-53], there
are
>>> now
>>> >>> > two aspects of quoting to configure: the quote character and
the
>>> quote
>>> >>> > policy (minimal, all, non-numeric, and none.) FYI, 'none' is
>>> currently
>>> >>> > not implemented.
>>> >>> >
>>> >>> > First, I changed (without consulting this list, and please
accept my
>>> >>> > apologies for this) the - IMO - cryptic and burdensome terminology
>>> of
>>> >>> > "encapsulator" to "quote char", and added "quote policy":
>>> >>> >
>>> >>> > - withQuoteChar(char)
>>> >>> > - withQuotePolicy(Quote)
>>> >>> >
>>> >>> > My intention here is that all Quote APIs start with "withQuote"
>>> >>> > followed by what aspect of quoting is being configured.
>>> >>> >
>>> >>> > Alternatively, we could have:
>>> >>> >
>>> >>> > - withQuote(char)
>>> >>> > - withQuotePolicy(Quote)
>>> >>>
>>> >>> or
>>> >>>
>>> >>> - withQuote(char)
>>> >>> - withQuote(Quote)
>>> >>>
>>> >>> ;-)
>>> >>>
>>> >>
>>> >> Darn, I wish I knew you better to know if you were joking! :)
>>> >>
>>> >> This would not be good IMO because you are configuring two different
>>> >> aspects of the behavior. When I see the same API name with different
>>> >> parameters, I think that they are the same and that the API just does
>>> >> conversions.
>>> >>
>>> >> We could consider making Quote a class instead of an enum and have it
>>> >> carry a char and an enum, such that one object defines all quoting
>>> >> aspects. This might be too normalized a design for something so simple
>>> >> though.
>>> >
>>> > Actually I did not had a closer look to the API. You're definitely
>>> right to
>>> > use different names for different aspects. It does not make sense to
>>> > overload just for fun.
>>> >
>>> >>
>>> >>
>>> >>>
>>> >>> > Which makes the API more consistent with the other char/Character
>>> based
>>> >>> > properties:
>>> >>> >
>>> >>> > - withEscape
>>> >>> > - withDelimiter
>>> >>> > - withLineSeparator
>>> >>> > - withCommentStart
>>> >>> >
>>> >>> > none of the above are post-fixed with a "Char" in the name.
>>> >>> >
>>> >>> > As far as reading, for me, the "-r" names are OK because the
they
>>> are
>>> >>> > nouns (things): "a delimiter", "a line separator." But I do
not talk
>>> >>> about
>>> >>> > "an escape" because that would be an act (think Alcatraz) as
>>> opposed to
>>> >>> > what we have here: a character used to /perform/ escapes.
>>> >>> >
>>> >>> > So I propose to change "escape" to "escape char" because "escaper"
>>> is
>>> >>> > not a word.
>>> >>> >
>>> >>> > The name "comment start" is not great also because it implies
(to
>>> me)
>>> >>> that
>>> >>> > there is a "comment end" missing. So plain "comment" or "comment
>>> char"
>>> >>> > would be better.
>>> >>>
>>> >>> Who said it has to be a single char?
>>> >>>
>>> >>
>>> >> The current implementation does. ;)
>>> >>
>>> >> Are comments even in any RFC?
>>> >
>>> > Not that I am aware of.
>>> >
>>> >>> .withEOLComment("//")
>>> >>>
>>> >>>
>>> >>> Same applies to the line separator:
>>> >>>
>>> >>> .withLineSeparator("\n\r")
>>> >>>
>>> >>> > Circling back to "quote char" which I have the way it is now
for the
>>> >>> > same reason as for the "escape" property.
>>> >>> >
>>> >>> > In summary, using *Char names is better IMO.
>>> >>>
>>> >>> Only if it can be a single char only. If it can either be a single
>>> char
>>> >>> or a
>>> >>> String, I normally tend to use overloaded methods:
>>> >>>
>>> >>> - withEOLComment(char)
>>> >>> - withEOLComment(CharSequence)
>>> >>>
>>> >>
>>> >> If you want to add // to the mix, please start a different thread. I'm
>>> not
>>> >> sure this is really needed. Do you have a real life use case?
>>> >
>>> > People come up with all kind of "solutions" they are used to. CSV is
>>> brittle
>>> > anyway, just because there is no "real" standard.
>>> >
>>> > Cheers,
>>> > Jörg
>>> >
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> > For additional commands, e-mail: dev-help@commons.apache.org
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
>>>
>>
>>
>> --
>> E-Mail: garydgregory@gmail.com | ggregory@apache.org
>> JUnit in Action, 2nd Ed: <http://goog_1249600977>http://bit.ly/ECvg0
>> Spring Batch in Action: <http://s.apache.org/HOq>http://bit.ly/bqpbCK
>> Blog: http://garygregory.wordpress.com
>> Home: http://garygregory.com/
>> Tweet! http://twitter.com/GaryGregory
>>
>
>
>
> --
> E-Mail: garydgregory@gmail.com | ggregory@apache.org
> JUnit in Action, 2nd Ed: <http://goog_1249600977>http://bit.ly/ECvg0
> Spring Batch in Action: <http://s.apache.org/HOq>http://bit.ly/bqpbCK
> Blog: http://garygregory.wordpress.com
> Home: http://garygregory.com/
> Tweet! http://twitter.com/GaryGregory

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message