commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gary Gregory <garydgreg...@gmail.com>
Subject Re: [CSV] CSVMutableRecord
Date Fri, 18 Aug 2017 19:21:45 GMT
On Fri, Aug 18, 2017 at 12:50 PM, Gilles <gilles@harfang.homelinux.org>
wrote:

> On Fri, 18 Aug 2017 12:41:01 -0600, Gary Gregory wrote:
>
>> On Wed, Aug 16, 2017 at 6:28 PM, Gilles <gilles@harfang.homelinux.org>
>> wrote:
>>
>> On Wed, 16 Aug 2017 11:27:53 -0600, Gary Gregory wrote:
>>>
>>> Let's summarize the options:
>>>>
>>>> 0) do nothing
>>>> 1) Add two put methods to CVSRecord making the class mutable
>>>> 2) Add a "mutableRecord" boolean option to CVSRecord and CSVFormat such
>>>> that a new boolean in CVSRecord allow method from 1) above to either
>>>> work
>>>> or throw an exception.
>>>> 3) Add a "mutableRecord" boolean option to CVSRecord and CSVFormat such
>>>> that subclass of CVSRecord called CVSMutableRecord is created which
>>>> contains two new put methods
>>>>
>>>> What else?
>>>>
>>>>
>>> 4) The factory method:
>>>  /**
>>>   * @param orig Original to be copied.
>>>   * @param replace Fields to be replaced.
>>>   * @return a copy of "orig", except for the fields in "replace".
>>>   */
>>>  public static CSVRecord createRecord(CSVRecord orig,
>>>                                       Pair<Integer, String> ... replace)
>>>
>>> Could also be:
>>>  public static CSVRecord createRecord(CSVRecord orig,
>>>                                       int[] replaceIndices,
>>>                                       String[] replaceValues)
>>>
>>
>>
>> To me, that feels like bending over backwards and hard to very ugly to
>> use,
>> building an array of ints, building an array of Strings. Yikes. But, hey
>> that's just me.
>>
>
> What about the "Pair" alternative above?
>
> Another alternative would be to have a "CSVRecordBuilder" (with "put"
> methods):
> ---CUT---
> CSVRecord rec = readRecord(source);
> rec = new CSVRecordBuilder(rec).put(1, "Change").put(4, "this").build();
> ---CUT---
>

For my money, the above is convoluted for no reason from a user's POV,
again compared to the simple:

rec.put(1, "Change");
rec.put(4, "this");

Gary

>
>
> Gilles
>
>
>
>> Gary
>>
>>
>>>
>>> Gilles
>>>
>>>
>>>
>>> I like the simplest option: 1.
>>>>
>>>> Gary
>>>>
>>>> On Tue, Aug 15, 2017 at 6:01 PM, Gilles <gilles@harfang.homelinux.org>
>>>> wrote:
>>>>
>>>> On Tue, 15 Aug 2017 17:43:26 -0600, Gary Gregory wrote:
>>>>
>>>>>
>>>>> On Tue, Aug 15, 2017 at 5:32 PM, Gilles <gilles@harfang.homelinux.org>
>>>>>
>>>>>> wrote:
>>>>>>
>>>>>> On Tue, 15 Aug 2017 22:52:32 +0000, nitin mahendru wrote:
>>>>>>
>>>>>>
>>>>>>> On Tue, 15 Aug 2017 12:02:20 -0600, Gary Gregory wrote:
>>>>>>>
>>>>>>>
>>>>>>>> On Tue, Aug 15, 2017 at 10:38 AM, nitin mahendru
>>>>>>>>
>>>>>>>> <nitin.mahendru88@gmail.com
>>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> How about having a state in the class itself which
says that it's
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> mutable
>>>>>>>>>
>>>>>>>>>> or not.
>>>>>>>>>> If we call a setter on an immutable then it throws
an exception.
>>>>>>>>>> By default the records are immutable and you need
to make them
>>>>>>>>>> mutable
>>>>>>>>>> using a new API.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> A code example would be useful...
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Below is the pull request I added.
>>>>>>>> https://github.com/apache/commons-csv/pull/21
>>>>>>>>
>>>>>>>>
>>>>>>>> As I indicated in the previous message, this is functionally
>>>>>>>>
>>>>>>> breaking. [I'm diverting this discussion over to the "dev"
>>>>>>> mailing list.]
>>>>>>>
>>>>>>>
>>>>>>> Saying that making record mutable is "breaking" is a bit unfair
when
>>>>>>>
>>>>>> we do
>>>>>> NOT document the mutability of the class in the first place.
>>>>>>
>>>>>>
>>>>>> I'm stating a fact: class is currently immutable, change
>>>>> would make it mutable; it is functionally breaking.
>>>>> I didn't say that you are forbidden to do it; just that
>>>>> it would be unwise, particularly if it would be to save
>>>>> a few bytes.
>>>>>
>>>>> Gilles
>>>>>
>>>>>
>>>>>
>>>>> Gary
>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> The following should be an interesting read:
>>>>>>
>>>>>>>   http://markmail.org/message/6ytvmxvy2ndsfp7h
>>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>> Gilles
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Aug 15, 2017 at 11:17 AM Gilles <
>>>>>>> gilles@harfang.homelinux.org>
>>>>>>>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> On Tue, 15 Aug 2017 12:02:20 -0600, Gary Gregory wrote:
>>>>>>>>
>>>>>>>> > On Tue, Aug 15, 2017 at 10:38 AM, nitin mahendru
>>>>>>>>
>>>>>>>>> > <nitin.mahendru88@gmail.com
>>>>>>>>> >> wrote:
>>>>>>>>> >
>>>>>>>>> >> How about having a state in the class itself
which says that
>>>>>>>>> it's
>>>>>>>>> >> mutable
>>>>>>>>> >> or not.
>>>>>>>>> >> If we call a setter on an immutable then it
throws an exception.
>>>>>>>>> >> By default the records are immutable and you
need to make them
>>>>>>>>> >> mutable
>>>>>>>>> >> using a new API.
>>>>>>>>>
>>>>>>>>> A code example would be useful...
>>>>>>>>>
>>>>>>>>> >> pros: Saves memory, Keeps the immutability benefits
>>>>>>>>>
>>>>>>>>> What kind of usage are you considering that a single
transient
>>>>>>>>> record matters (as compared to the ~300 MB of the JVM
itself...)?
>>>>>>>>>
>>>>>>>>> >> cons: people using "mutable" records need to
be careful.(While
>>>>>>>>> >> threading
>>>>>>>>> >> maybe)
>>>>>>>>> >>
>>>>>>>>> >
>>>>>>>>> > Interesting idea!
>>>>>>>>> >
>>>>>>>>> > But I think I like the idea of a subclass better
if we are going
>>>>>>>>> to
>>>>>>>>> > split
>>>>>>>>> > the behavior b/w mutable and immutable.
>>>>>>>>>
>>>>>>>>> Once you have a subclass that is able to modify the state
of
>>>>>>>>> its parent, it's a mutable object. Period.
>>>>>>>>> There is no such thing as a "split".
>>>>>>>>>
>>>>>>>>> >
>>>>>>>>> > For my money and the KISS principle, I would just
add the put
>>>>>>>>> method
>>>>>>>>> > in
>>>>>>>>> > CSVRecord.
>>>>>>>>>
>>>>>>>>> Then, any use that assumes immutability will be broken.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Gilles
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> > Gary
>>>>>>>>> >
>>>>>>>>> >>
>>>>>>>>> >> -Nitin
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >> On Tue, Aug 15, 2017 at 9:01 AM Gilles
>>>>>>>>> >> <gilles@harfang.homelinux.org>
>>>>>>>>> >> wrote:
>>>>>>>>> >>
>>>>>>>>> >> > On Tue, 15 Aug 2017 09:49:04 -0600, Gary
Gregory wrote:
>>>>>>>>> >> > > That looks odd to me. What comes up
for me is the use case
>>>>>>>>> where
>>>>>>>>> >> I
>>>>>>>>> >> > > want to
>>>>>>>>> >> > > ETL a file of 10,000,000 records and
update, say, one
>>>>>>>>> column.
>>>>>>>>> If
>>>>>>>>> >> am
>>>>>>>>> >> > > forced
>>>>>>>>> >> > > to create a brand new record for every
record read, that
>>>>>>>>> would
>>>>>>>>> >> be a
>>>>>>>>> >> > > shame.
>>>>>>>>> >> >
>>>>>>>>> >> > Why?
>>>>>>>>> >> >
>>>>>>>>> >> > > If I had a mutable record, I could
just keep on updating it
>>>>>>>>> and
>>>>>>>>> >> using
>>>>>>>>> >> > > it to
>>>>>>>>> >> > > write each row. Read record, update
it, write record. No
>>>>>>>>> extra
>>>>>>>>> >> memory
>>>>>>>>> >> > > needed.
>>>>>>>>> >> >
>>>>>>>>> >> > How is the size of 1 additional record
going to matter
>>>>>>>>> compared
>>>>>>>>> to
>>>>>>>>> >> the
>>>>>>>>> >> > size of the whole program?
>>>>>>>>> >> >
>>>>>>>>> >> > > Either we can make the current record
mutable (what's the
>>>>>>>>> harm?)
>>>>>>>>> >> or
>>>>>>>>> >> > > we can
>>>>>>>>> >> > > make the parser serve out mutable
records based on a config
>>>>>>>>> >> setting.
>>>>>>>>> >> > > This
>>>>>>>>> >> > > could be a subclass of CSVRecord with
the extra method I
>>>>>>>>> >> proposed.
>>>>>>>>> >> >
>>>>>>>>> >> > The harm is that you loose all the promises
of immutability.
>>>>>>>>> >> >
>>>>>>>>> >> > Regards,
>>>>>>>>> >> > Gilles
>>>>>>>>> >> >
>>>>>>>>> >> > >
>>>>>>>>> >> > > Thoughts?
>>>>>>>>> >> > >
>>>>>>>>> >> > > Gary
>>>>>>>>> >> > >
>>>>>>>>> >> > > On Tue, Aug 15, 2017 at 8:33 AM, Gilles
>>>>>>>>> >> > > <gilles@harfang.homelinux.org>
>>>>>>>>> >> > > wrote:
>>>>>>>>> >> > >
>>>>>>>>> >> > >> On Tue, 15 Aug 2017 08:01:53 -0600,
Gary Gregory wrote:
>>>>>>>>> >> > >>
>>>>>>>>> >> > >>> How does that work when you
want to change more than one
>>>>>>>>> >> value?
>>>>>>>>> >> > >>>
>>>>>>>>> >> > >>
>>>>>>>>> >> > >> How about a "vararg" argument:
>>>>>>>>> >> > >>
>>>>>>>>> >> > >> /**
>>>>>>>>> >> > >>  * @param orig Original to be
copied.
>>>>>>>>> >> > >>  * @param replace Fields to be
replaced.
>>>>>>>>> >> > >>  */
>>>>>>>>> >> > >> public static CSVRecord createRecord(CSVRecord
orig,
>>>>>>>>> >> > >>                              
       Pair<Integer, String>
>>>>>>>>> ...
>>>>>>>>> >> > >> replace) {
>>>>>>>>> >> > >>     // ...
>>>>>>>>> >> > >> }
>>>>>>>>> >> > >>
>>>>>>>>> >> > >>
>>>>>>>>> >> > >> Gilles
>>>>>>>>> >> > >>
>>>>>>>>> >> > >>
>>>>>>>>> >> > >>
>>>>>>>>> >> > >>> Gary
>>>>>>>>> >> > >>>
>>>>>>>>> >> > >>> On Aug 15, 2017 00:17, "Benedikt
Ritter" <
>>>>>>>>> britter@apache.org>
>>>>>>>>> >> > >>> wrote:
>>>>>>>>> >> > >>>
>>>>>>>>> >> > >>> Hi,
>>>>>>>>> >> > >>>>
>>>>>>>>> >> > >>>> I very much like that
CSVRecord is unmodifiable. So I’d
>>>>>>>>> >> suggest an
>>>>>>>>> >> > >>>> API,
>>>>>>>>> >> > >>>> that creates a new record
instead of mutating the
>>>>>>>>> existing
>>>>>>>>> >> one:
>>>>>>>>> >> > >>>>
>>>>>>>>> >> > >>>> CSVRecord newRecord =
myRecord.put(1, „value")
>>>>>>>>> >> > >>>>
>>>>>>>>> >> > >>>> I’m not sure about „put“
as a method name since it
>>>>>>>>> clashes
>>>>>>>>> >> with
>>>>>>>>> >> > >>>> java.util.Map#put, which
is mutation based...
>>>>>>>>> >> > >>>>
>>>>>>>>> >> > >>>> Regards,
>>>>>>>>> >> > >>>> Benedikt
>>>>>>>>> >> > >>>>
>>>>>>>>> >> > >>>> > Am 15.08.2017 um
02:54 schrieb Gary Gregory
>>>>>>>>> >> > >>>> <garydgregory@gmail.com>:
>>>>>>>>> >> > >>>> >
>>>>>>>>> >> > >>>> > Feel free to provide
a PR on GitHub :-)
>>>>>>>>> >> > >>>> >
>>>>>>>>> >> > >>>> > Gary
>>>>>>>>> >> > >>>> >
>>>>>>>>> >> > >>>> > On Aug 14, 2017 15:29,
"Gary Gregory"
>>>>>>>>> >> <garydgregory@gmail.com>
>>>>>>>>> >> > >>>> wrote:
>>>>>>>>> >> > >>>> >
>>>>>>>>> >> > >>>> >> I think we've
kept the design as YAGNI as possible...
>>>>>>>>> :-)
>>>>>>>>> >> > >>>> >>
>>>>>>>>> >> > >>>> >> Gary
>>>>>>>>> >> > >>>> >>
>>>>>>>>> >> > >>>> >> On Mon, Aug 14,
2017 at 3:25 PM, nitin mahendru <
>>>>>>>>> >> > >>>> >> nitin.mahendru88@gmail.com>
wrote:
>>>>>>>>> >> > >>>> >>
>>>>>>>>> >> > >>>> >>> Yeah that
also is OK. I though there is a reason to
>>>>>>>>> keep
>>>>>>>>> >> the
>>>>>>>>> >> > >>>> CSVRecord
>>>>>>>>> >> > >>>> >>> without setters.
But maybe not!
>>>>>>>>> >> > >>>> >>>
>>>>>>>>> >> > >>>> >>> Nitin
>>>>>>>>> >> > >>>> >>>
>>>>>>>>> >> > >>>> >>>
>>>>>>>>> >> > >>>> >>>
>>>>>>>>> >> > >>>> >>>
>>>>>>>>> >> > >>>> >>> On Mon, Aug
14, 2017 at 2:22 PM Gary Gregory
>>>>>>>>> >> > >>>> <garydgregory@gmail.com
>>>>>>>>> >> > >>>> >
>>>>>>>>> >> > >>>> >>> wrote:
>>>>>>>>> >> > >>>> >>>
>>>>>>>>> >> > >>>> >>>> Hi All:
>>>>>>>>> >> > >>>> >>>>
>>>>>>>>> >> > >>>> >>>> Should
we consider adding put(int,Object) and
>>>>>>>>> >> put(String,
>>>>>>>>> >> > >>>> Object) to
>>>>>>>>> >> > >>>> the
>>>>>>>>> >> > >>>> >>>> current
CSVRecord class?
>>>>>>>>> >> > >>>> >>>>
>>>>>>>>> >> > >>>> >>>> Gary
>>>>>>>>> >> > >>>> >>>>
>>>>>>>>> >> > >>>> >>>> On Mon,
Aug 14, 2017 at 2:54 PM, nitin mahendru <
>>>>>>>>> >> > >>>> >>>> nitin.mahendru88@gmail.com>
>>>>>>>>> >> > >>>> >>>> wrote:
>>>>>>>>> >> > >>>> >>>>
>>>>>>>>> >> > >>>> >>>>> Hi
Everyone,
>>>>>>>>> >> > >>>> >>>>>
>>>>>>>>> >> > >>>> >>>>> I
recently pushed a change(pull request 20) to get
>>>>>>>>> the
>>>>>>>>> >> line
>>>>>>>>> >> > >>>> ending
>>>>>>>>> >> > >>>> >>> from
>>>>>>>>> >> > >>>> >>>> the
>>>>>>>>> >> > >>>> >>>>> parser.
>>>>>>>>> >> > >>>> >>>>>
>>>>>>>>> >> > >>>> >>>>> Now
I want to push another change which I feel will
>>>>>>>>> >> also be
>>>>>>>>> >> > >>>> useful
>>>>>>>>> >> > >>>> for
>>>>>>>>> >> > >>>> >>>> the
>>>>>>>>> >> > >>>> >>>>> community.
I want to add a CSVRecordMutable class
>>>>>>>>> which
>>>>>>>>> >> had
>>>>>>>>> >> > >>>> a
>>>>>>>>> >> > >>>> >>> constructor
>>>>>>>>> >> > >>>> >>>>> which
accepts a CSVRecord object. So when we have a
>>>>>>>>> >> > >>>> CSVRecordMutable
>>>>>>>>> >> > >>>> >>>> object
>>>>>>>>> >> > >>>> >>>>> from
it then we can edit individual columns using
>>>>>>>>> it.
>>>>>>>>> >> > >>>> >>>>>
>>>>>>>>> >> > >>>> >>>>> I
would be using this to write back my edited CSV
>>>>>>>>> file.
>>>>>>>>> >> My
>>>>>>>>> >> > >>>> use case
>>>>>>>>> >> > >>>> >>> is to
>>>>>>>>> >> > >>>> >>>>> read
a csv, mangle some columns, write back a new
>>>>>>>>> csv.
>>>>>>>>> >> > >>>> >>>>>
>>>>>>>>> >> > >>>> >>>>> I
could have directly raised a pull request but I
>>>>>>>>> just
>>>>>>>>> >> > >>>> wanted to
>>>>>>>>> >> > >>>> float
>>>>>>>>> >> > >>>> >>>> the
>>>>>>>>> >> > >>>> >>>>> idea
before and see the reaction.
>>>>>>>>> >> > >>>> >>>>>
>>>>>>>>> >> > >>>> >>>>> Thanks
>>>>>>>>> >> > >>>> >>>>>
>>>>>>>>> >> > >>>> >>>>> Nitin
>>>>>>>>> >> > >>>> >>>>>
>>>>>>>>> >> > >>>> >>>>
>>>>>>>>> >> > >>>> >>>
>>>>>>>>> >> > >>>> >>
>>>>>>>>> >> > >>>> >>
>>>>>>>>> >> > >>>>
>>>>>>>>> >> > >>>>
>>>>>>>>> >> > >>>>
>>>>>>>>> >> > >>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
>>>
>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message