mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <dlie...@gmail.com>
Subject Re: persistence function naming convnentions
Date Mon, 29 Sep 2014 00:02:06 GMT
Sounds reasonable.

Sent from my phone.
On Sep 28, 2014 9:39 AM, "Pat Ferrel" <pat@occamsmachete.com> wrote:

> OK, this is a reasonable train of though and your names seem fine. However
> text is actually the persistent representation of what I was calling an
> extended-DRM, which should probably be called an IndexedDataset. I don’t
> see the difference between import and persistence since there are never
> user visible intermediate files. Also simple CSV is the only currently
> supported format for IndexedDataset, there will be others as needs present
> themselves.
>
> Therefore following your train of thought and to fit the changes you
> suggest for DRM naming I’d change the IndexedDataset names to:
>
> Package level
> indexedDatasetDfsRead(src: String, schema: Schema = DefaultSchema):
> IndexedDataset
>
> Method level
> indexedDataset.dfsWrite(dest: String, schema: Schema = DefaultSchema)
>
> Once read the DRM is a CheckpointedDrm contained in the IndexedDataset. So
> call it import/export or persistence a user can use either the sequence
> file or text to read/write DRMs
>
> Seem reasonable?
>
> On Sep 26, 2014, at 11:30 AM, Dmitriy Lyubimov <notifications@github.com>
> wrote:
>
> to be a bit more concrete, there's indeed slight discrepancy between write
> and read names, but semantically they are what they say they are, i.e. they
> are persisting drm to hdfs.
>
> To be even more concrete, i am probably for simply package-level
> drmDfsRead() and method-level dfsWrite() names.
>
> The convention here is that all drm-related package-level routines start
> with drm prefix so we don't easily mix these things with other things in
> global scope.
>
> Now, everything else, including reading/writing CSV formats, is an export
> operation (as opposed to persistence). Consequently, proper names are
> perhaps along the lines drmImportCSV and exportCSV respectively. Import and
> export emphasizes the fact that format is not native, loses a lot of
> coherency enforcement, and requires a lot of validation while parsing back.
>
> —
> Reply to this email directly or view it on GitHub.
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message