commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gary Gregory <garydgreg...@gmail.com>
Subject Re: [CSV] CSVMutableRecord
Date Fri, 18 Aug 2017 18:41:01 GMT
On Wed, Aug 16, 2017 at 6:28 PM, Gilles <gilles@harfang.homelinux.org>
wrote:

> On Wed, 16 Aug 2017 11:27:53 -0600, Gary Gregory wrote:
>
>> Let's summarize the options:
>>
>> 0) do nothing
>> 1) Add two put methods to CVSRecord making the class mutable
>> 2) Add a "mutableRecord" boolean option to CVSRecord and CSVFormat such
>> that a new boolean in CVSRecord allow method from 1) above to either work
>> or throw an exception.
>> 3) Add a "mutableRecord" boolean option to CVSRecord and CSVFormat such
>> that subclass of CVSRecord called CVSMutableRecord is created which
>> contains two new put methods
>>
>> What else?
>>
>
> 4) The factory method:
>  /**
>   * @param orig Original to be copied.
>   * @param replace Fields to be replaced.
>   * @return a copy of "orig", except for the fields in "replace".
>   */
>  public static CSVRecord createRecord(CSVRecord orig,
>                                       Pair<Integer, String> ... replace)
>
> Could also be:
>  public static CSVRecord createRecord(CSVRecord orig,
>                                       int[] replaceIndices,
>                                       String[] replaceValues)


To me, that feels like bending over backwards and hard to very ugly to use,
building an array of ints, building an array of Strings. Yikes. But, hey
that's just me.

Gary

>
>
> Gilles
>
>
>
>> I like the simplest option: 1.
>>
>> Gary
>>
>> On Tue, Aug 15, 2017 at 6:01 PM, Gilles <gilles@harfang.homelinux.org>
>> wrote:
>>
>> On Tue, 15 Aug 2017 17:43:26 -0600, Gary Gregory wrote:
>>>
>>> On Tue, Aug 15, 2017 at 5:32 PM, Gilles <gilles@harfang.homelinux.org>
>>>> wrote:
>>>>
>>>> On Tue, 15 Aug 2017 22:52:32 +0000, nitin mahendru wrote:
>>>>
>>>>>
>>>>> On Tue, 15 Aug 2017 12:02:20 -0600, Gary Gregory wrote:
>>>>>
>>>>>>
>>>>>> On Tue, Aug 15, 2017 at 10:38 AM, nitin mahendru
>>>>>>
>>>>>>> <nitin.mahendru88@gmail.com
>>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> How about having a state in the class itself which says that
it's
>>>>>>>
>>>>>>> mutable
>>>>>>>> or not.
>>>>>>>> If we call a setter on an immutable then it throws an exception.
>>>>>>>> By default the records are immutable and you need to make
them
>>>>>>>> mutable
>>>>>>>> using a new API.
>>>>>>>>
>>>>>>>>
>>>>>>>> A code example would be useful...
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Below is the pull request I added.
>>>>>> https://github.com/apache/commons-csv/pull/21
>>>>>>
>>>>>>
>>>>>> As I indicated in the previous message, this is functionally
>>>>> breaking. [I'm diverting this discussion over to the "dev"
>>>>> mailing list.]
>>>>>
>>>>>
>>>>> Saying that making record mutable is "breaking" is a bit unfair when
>>>> we do
>>>> NOT document the mutability of the class in the first place.
>>>>
>>>>
>>> I'm stating a fact: class is currently immutable, change
>>> would make it mutable; it is functionally breaking.
>>> I didn't say that you are forbidden to do it; just that
>>> it would be unwise, particularly if it would be to save
>>> a few bytes.
>>>
>>> Gilles
>>>
>>>
>>>
>>> Gary
>>>>
>>>>
>>>>
>>>> The following should be an interesting read:
>>>>>   http://markmail.org/message/6ytvmxvy2ndsfp7h
>>>>>
>>>>>
>>>>> Regards,
>>>>> Gilles
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Aug 15, 2017 at 11:17 AM Gilles <gilles@harfang.homelinux.org>
>>>>>
>>>>>> wrote:
>>>>>>
>>>>>> On Tue, 15 Aug 2017 12:02:20 -0600, Gary Gregory wrote:
>>>>>>
>>>>>> > On Tue, Aug 15, 2017 at 10:38 AM, nitin mahendru
>>>>>>> > <nitin.mahendru88@gmail.com
>>>>>>> >> wrote:
>>>>>>> >
>>>>>>> >> How about having a state in the class itself which says
that it's
>>>>>>> >> mutable
>>>>>>> >> or not.
>>>>>>> >> If we call a setter on an immutable then it throws an
exception.
>>>>>>> >> By default the records are immutable and you need to
make them
>>>>>>> >> mutable
>>>>>>> >> using a new API.
>>>>>>>
>>>>>>> A code example would be useful...
>>>>>>>
>>>>>>> >> pros: Saves memory, Keeps the immutability benefits
>>>>>>>
>>>>>>> What kind of usage are you considering that a single transient
>>>>>>> record matters (as compared to the ~300 MB of the JVM itself...)?
>>>>>>>
>>>>>>> >> cons: people using "mutable" records need to be careful.(While
>>>>>>> >> threading
>>>>>>> >> maybe)
>>>>>>> >>
>>>>>>> >
>>>>>>> > Interesting idea!
>>>>>>> >
>>>>>>> > But I think I like the idea of a subclass better if we are
going to
>>>>>>> > split
>>>>>>> > the behavior b/w mutable and immutable.
>>>>>>>
>>>>>>> Once you have a subclass that is able to modify the state of
>>>>>>> its parent, it's a mutable object. Period.
>>>>>>> There is no such thing as a "split".
>>>>>>>
>>>>>>> >
>>>>>>> > For my money and the KISS principle, I would just add the
put
>>>>>>> method
>>>>>>> > in
>>>>>>> > CSVRecord.
>>>>>>>
>>>>>>> Then, any use that assumes immutability will be broken.
>>>>>>>
>>>>>>>
>>>>>>> Gilles
>>>>>>>
>>>>>>>
>>>>>>> > Gary
>>>>>>> >
>>>>>>> >>
>>>>>>> >> -Nitin
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> On Tue, Aug 15, 2017 at 9:01 AM Gilles
>>>>>>> >> <gilles@harfang.homelinux.org>
>>>>>>> >> wrote:
>>>>>>> >>
>>>>>>> >> > On Tue, 15 Aug 2017 09:49:04 -0600, Gary Gregory
wrote:
>>>>>>> >> > > That looks odd to me. What comes up for me
is the use case
>>>>>>> where
>>>>>>> >> I
>>>>>>> >> > > want to
>>>>>>> >> > > ETL a file of 10,000,000 records and update,
say, one column.
>>>>>>> If
>>>>>>> >> am
>>>>>>> >> > > forced
>>>>>>> >> > > to create a brand new record for every record
read, that would
>>>>>>> >> be a
>>>>>>> >> > > shame.
>>>>>>> >> >
>>>>>>> >> > Why?
>>>>>>> >> >
>>>>>>> >> > > If I had a mutable record, I could just keep
on updating it
>>>>>>> and
>>>>>>> >> using
>>>>>>> >> > > it to
>>>>>>> >> > > write each row. Read record, update it, write
record. No extra
>>>>>>> >> memory
>>>>>>> >> > > needed.
>>>>>>> >> >
>>>>>>> >> > How is the size of 1 additional record going to
matter compared
>>>>>>> to
>>>>>>> >> the
>>>>>>> >> > size of the whole program?
>>>>>>> >> >
>>>>>>> >> > > Either we can make the current record mutable
(what's the
>>>>>>> harm?)
>>>>>>> >> or
>>>>>>> >> > > we can
>>>>>>> >> > > make the parser serve out mutable records
based on a config
>>>>>>> >> setting.
>>>>>>> >> > > This
>>>>>>> >> > > could be a subclass of CSVRecord with the
extra method I
>>>>>>> >> proposed.
>>>>>>> >> >
>>>>>>> >> > The harm is that you loose all the promises of
immutability.
>>>>>>> >> >
>>>>>>> >> > Regards,
>>>>>>> >> > Gilles
>>>>>>> >> >
>>>>>>> >> > >
>>>>>>> >> > > Thoughts?
>>>>>>> >> > >
>>>>>>> >> > > Gary
>>>>>>> >> > >
>>>>>>> >> > > On Tue, Aug 15, 2017 at 8:33 AM, Gilles
>>>>>>> >> > > <gilles@harfang.homelinux.org>
>>>>>>> >> > > wrote:
>>>>>>> >> > >
>>>>>>> >> > >> On Tue, 15 Aug 2017 08:01:53 -0600, Gary
Gregory wrote:
>>>>>>> >> > >>
>>>>>>> >> > >>> How does that work when you want to
change more than one
>>>>>>> >> value?
>>>>>>> >> > >>>
>>>>>>> >> > >>
>>>>>>> >> > >> How about a "vararg" argument:
>>>>>>> >> > >>
>>>>>>> >> > >> /**
>>>>>>> >> > >>  * @param orig Original to be copied.
>>>>>>> >> > >>  * @param replace Fields to be replaced.
>>>>>>> >> > >>  */
>>>>>>> >> > >> public static CSVRecord createRecord(CSVRecord
orig,
>>>>>>> >> > >>                                      Pair<Integer,
String>
>>>>>>> ...
>>>>>>> >> > >> replace) {
>>>>>>> >> > >>     // ...
>>>>>>> >> > >> }
>>>>>>> >> > >>
>>>>>>> >> > >>
>>>>>>> >> > >> Gilles
>>>>>>> >> > >>
>>>>>>> >> > >>
>>>>>>> >> > >>
>>>>>>> >> > >>> Gary
>>>>>>> >> > >>>
>>>>>>> >> > >>> On Aug 15, 2017 00:17, "Benedikt Ritter"
<
>>>>>>> britter@apache.org>
>>>>>>> >> > >>> wrote:
>>>>>>> >> > >>>
>>>>>>> >> > >>> Hi,
>>>>>>> >> > >>>>
>>>>>>> >> > >>>> I very much like that CSVRecord
is unmodifiable. So I’d
>>>>>>> >> suggest an
>>>>>>> >> > >>>> API,
>>>>>>> >> > >>>> that creates a new record instead
of mutating the existing
>>>>>>> >> one:
>>>>>>> >> > >>>>
>>>>>>> >> > >>>> CSVRecord newRecord = myRecord.put(1,
„value")
>>>>>>> >> > >>>>
>>>>>>> >> > >>>> I’m not sure about „put“
as a method name since it clashes
>>>>>>> >> with
>>>>>>> >> > >>>> java.util.Map#put, which is mutation
based...
>>>>>>> >> > >>>>
>>>>>>> >> > >>>> Regards,
>>>>>>> >> > >>>> Benedikt
>>>>>>> >> > >>>>
>>>>>>> >> > >>>> > Am 15.08.2017 um 02:54 schrieb
Gary Gregory
>>>>>>> >> > >>>> <garydgregory@gmail.com>:
>>>>>>> >> > >>>> >
>>>>>>> >> > >>>> > Feel free to provide a PR
on GitHub :-)
>>>>>>> >> > >>>> >
>>>>>>> >> > >>>> > Gary
>>>>>>> >> > >>>> >
>>>>>>> >> > >>>> > On Aug 14, 2017 15:29, "Gary
Gregory"
>>>>>>> >> <garydgregory@gmail.com>
>>>>>>> >> > >>>> wrote:
>>>>>>> >> > >>>> >
>>>>>>> >> > >>>> >> I think we've kept the
design as YAGNI as possible...
>>>>>>> :-)
>>>>>>> >> > >>>> >>
>>>>>>> >> > >>>> >> Gary
>>>>>>> >> > >>>> >>
>>>>>>> >> > >>>> >> On Mon, Aug 14, 2017
at 3:25 PM, nitin mahendru <
>>>>>>> >> > >>>> >> nitin.mahendru88@gmail.com>
wrote:
>>>>>>> >> > >>>> >>
>>>>>>> >> > >>>> >>> Yeah that also is
OK. I though there is a reason to
>>>>>>> keep
>>>>>>> >> the
>>>>>>> >> > >>>> CSVRecord
>>>>>>> >> > >>>> >>> without setters.
But maybe not!
>>>>>>> >> > >>>> >>>
>>>>>>> >> > >>>> >>> Nitin
>>>>>>> >> > >>>> >>>
>>>>>>> >> > >>>> >>>
>>>>>>> >> > >>>> >>>
>>>>>>> >> > >>>> >>>
>>>>>>> >> > >>>> >>> On Mon, Aug 14, 2017
at 2:22 PM Gary Gregory
>>>>>>> >> > >>>> <garydgregory@gmail.com
>>>>>>> >> > >>>> >
>>>>>>> >> > >>>> >>> wrote:
>>>>>>> >> > >>>> >>>
>>>>>>> >> > >>>> >>>> Hi All:
>>>>>>> >> > >>>> >>>>
>>>>>>> >> > >>>> >>>> Should we consider
adding put(int,Object) and
>>>>>>> >> put(String,
>>>>>>> >> > >>>> Object) to
>>>>>>> >> > >>>> the
>>>>>>> >> > >>>> >>>> current CSVRecord
class?
>>>>>>> >> > >>>> >>>>
>>>>>>> >> > >>>> >>>> Gary
>>>>>>> >> > >>>> >>>>
>>>>>>> >> > >>>> >>>> On Mon, Aug 14,
2017 at 2:54 PM, nitin mahendru <
>>>>>>> >> > >>>> >>>> nitin.mahendru88@gmail.com>
>>>>>>> >> > >>>> >>>> wrote:
>>>>>>> >> > >>>> >>>>
>>>>>>> >> > >>>> >>>>> Hi Everyone,
>>>>>>> >> > >>>> >>>>>
>>>>>>> >> > >>>> >>>>> I recently
pushed a change(pull request 20) to get
>>>>>>> the
>>>>>>> >> line
>>>>>>> >> > >>>> ending
>>>>>>> >> > >>>> >>> from
>>>>>>> >> > >>>> >>>> the
>>>>>>> >> > >>>> >>>>> parser.
>>>>>>> >> > >>>> >>>>>
>>>>>>> >> > >>>> >>>>> Now I want
to push another change which I feel will
>>>>>>> >> also be
>>>>>>> >> > >>>> useful
>>>>>>> >> > >>>> for
>>>>>>> >> > >>>> >>>> the
>>>>>>> >> > >>>> >>>>> community.
I want to add a CSVRecordMutable class
>>>>>>> which
>>>>>>> >> had
>>>>>>> >> > >>>> a
>>>>>>> >> > >>>> >>> constructor
>>>>>>> >> > >>>> >>>>> which accepts
a CSVRecord object. So when we have a
>>>>>>> >> > >>>> CSVRecordMutable
>>>>>>> >> > >>>> >>>> object
>>>>>>> >> > >>>> >>>>> from it then
we can edit individual columns using it.
>>>>>>> >> > >>>> >>>>>
>>>>>>> >> > >>>> >>>>> I would be
using this to write back my edited CSV
>>>>>>> file.
>>>>>>> >> My
>>>>>>> >> > >>>> use case
>>>>>>> >> > >>>> >>> is to
>>>>>>> >> > >>>> >>>>> read a csv,
mangle some columns, write back a new
>>>>>>> csv.
>>>>>>> >> > >>>> >>>>>
>>>>>>> >> > >>>> >>>>> I could have
directly raised a pull request but I
>>>>>>> just
>>>>>>> >> > >>>> wanted to
>>>>>>> >> > >>>> float
>>>>>>> >> > >>>> >>>> the
>>>>>>> >> > >>>> >>>>> idea before
and see the reaction.
>>>>>>> >> > >>>> >>>>>
>>>>>>> >> > >>>> >>>>> Thanks
>>>>>>> >> > >>>> >>>>>
>>>>>>> >> > >>>> >>>>> Nitin
>>>>>>> >> > >>>> >>>>>
>>>>>>> >> > >>>> >>>>
>>>>>>> >> > >>>> >>>
>>>>>>> >> > >>>> >>
>>>>>>> >> > >>>> >>
>>>>>>> >> > >>>>
>>>>>>> >> > >>>>
>>>>>>> >> > >>>>
>>>>>>> >> > >>
>>>>>>>
>>>>>>>
>>>>>>>
>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message