metamodel-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hans Drexler <Hans.Drex...@HumanInference.com>
Subject RE: [DISCUSS] use folder name as schema name for file based DataContexts
Date Fri, 23 Aug 2013 07:42:42 GMT
Hi Kasper,

Sorry for not voting on this. I am not yet up to speed about rules regarding the voting.

Hans

-----Original Message-----
From: Kasper Sørensen [mailto:i.am.kasper.sorensen@gmail.com] 
Sent: Friday, August 23, 2013 9:37 AM
To: dev@metamodel.incubator.apache.org
Subject: Re: [DISCUSS] use folder name as schema name for file based DataContexts

OK I'm going to commit this on the basis of lazy concensus.

But as a small side note, I'd like to also invite more people to vote :-)

2013/8/21 Ankit Kumar <ak.ankitkumar@gmail.com>:
> +1
>
> Regards
> Ankit
>
>
> On Tue, Aug 20, 2013 at 4:26 PM, Kasper Sørensen < 
> i.am.kasper.sorensen@gmail.com> wrote:
>
>> I've updated my gist/patch [1] with also support for using quotes in 
>> the table/column paths. Let's have a vote on this patch, to see if we 
>> can get this in.
>>
>> [1] https://gist.github.com/kaspersorensen/6210970
>>
>> 2013/8/20 Kasper Sørensen <i.am.kasper.sorensen@gmail.com>:
>> > Agreed on all. Except why should dots in column names be any 
>> > different than schema and table names?
>> >
>> > 2013/8/16 Hans Drexler <Hans.Drexler@humaninference.com>:
>> >> I believe that probably, *every* convention will have its drawbacks.
>> using a factory can help on one hand, but it can also cause great 
>> confusion if things get mixed. It also makes things more complex. If 
>> we clearly document the choice made, I will live with that.
>> >>
>> >> My main point is that  we should try to write and document the 
>> >> software
>> in such way that MetaModel users will not get confused. I like the 
>> quotes idea, since that will allow the user to explicitely express 
>> what is intended. But then, lets extend it to something like this:
>> >>
>> >> "schema_name"."table_name"."column_name"
>> >>
>> >> Where schema_name and table_name can contain dots (".").   (I guess
>> column names cannot...)
>> >>
>> >> I hope you don't mind me rambling about this...
>> >>
>> >> kind regards,
>> >>
>> >> Hans
>> >>
>> >> -----Original Message-----
>> >> From: Kasper Sørensen [mailto:i.am.kasper.sorensen@gmail.com]
>> >> Sent: Wednesday, August 14, 2013 2:59 PM
>> >> To: dev@metamodel.incubator.apache.org
>> >> Subject: Re: [DISCUSS] use folder name as schema name for file 
>> >> based
>> DataContexts
>> >>
>> >> With those different preferences, we could even consider making
>> something like a "TableNameFactory" which converts filenames into 
>> table names. But I guess the crucial point is which default convention to use.
>> >>
>> >> Underscoring makes it a bit cleaner to look at the column or table
>> paths, but it also makes the representation less direct. A user could 
>> start wondering if there are other characters than dots that will be 
>> replaced by underscores etc.
>> >>
>> >> It should be noted that MM's parser does support dots in both 
>> >> table and
>> schema names, so this is probably mostly a question of aesthetics.
>> >>
>> >> The ambiguity that you point out is also interesting. So far I 
>> >> haven't
>> seen it appear in real life, but technically it could occur that you 
>> had two pairs of schemas and tables that would generate a ambigious table path.
>> For instance:
>> >>
>> >> Schema: foo.bar
>> >> Table: baz
>> >>
>> >> and
>> >>
>> >> Schema: foo
>> >> Table: bar.baz
>> >>
>> >> The parser would currently favor the second schema ("foo") since 
>> >> it
>> incrementally tries for schema/table/column matches with every 
>> dot-separated token. An improvement to the parser would be to allow 
>> quote characters, so that you could express your table path like this
>> >> then:
>> >>
>> >> "foo.bar".baz
>> >>
>> >> Also I want to note that some databases do support dots in
>> schema/table/column names, so this ambiguity can (although rarely) 
>> also occur in a RDBMS or other data sources. It would also be quite 
>> common with some separator (not necesarily a dot) in NoSQL database 
>> column names, to indicate a nested field. In HBase for instance they 
>> are referred using colon, like this: "columnFamily:column".
>> >>
>> >> All in all I am mostly feeling like preserving the dots from the
>> filenames, but am also very curious what other people think!
>> >>
>> >> 2013/8/14 Hans Drexler <Hans.Drexler@humaninference.com>:
>> >>> Hi,
>> >>>
>> >>> First I agree with bumping this issue. When at the customer, this
>> thing caused a lot of time spent in figuring out what was going on. I 
>> am not sure if I like the extension as part of the table name, because:
>> >>> - I would never create a table in a relational database with a 
>> >>> dot in the name
>> >>> - It creates a ambiguity. If you have a "full" path name to a 
>> >>> column,
>> like " documents.people.csv.name ", then it is not clear if the 
>> schema name is "documents.people" and the table name is "csv", or 
>> that the schema name is "documents" and the table name is 
>> "people.csv". It seems natural to me that schema names contain dots, but not table
names.
>> >>>
>> >>> Alternatives:
>> >>> - Leave the extension out of the name (probably not acceptable,
>> because then you can no longer have two "tables" differing only in 
>> extension). Although I must say that personally I think this would be 
>> the best solution.
>> >>>
>> >>> - Use a conventional name, like:
>> >>> Schema name: Folder name
>> >>> Table name: The filename, including extension (all dots replaced 
>> >>> by
>> underscores).
>> >>> Resulting in e.g. a column path like this:
>> >>> documents.people_csv.name
>> >>>
>> >>> At the customer site, the file I needed to use was actually 
>> >>> called
>> like this pattern: "bar/FOO.PEOPLE.IN.FILE". Using the convention, 
>> this would become:
>> >>> bar.FOO_PEOPLE_IN_FILE
>> >>>
>> >>> IMHO this is preferable to  "bar.foo.people.in.file"
>> >>>
>> >>> The problem is of course that it would now be impossible to have 
>> >>> another file "bar/FOO_PEOPLE_IN_FILE" :-(
>> >>>
>> >>> I am happy to hear other peoples thougths.
>> >>>
>> >>>
>> >>> Hans
>> >>>
>> >>>
>> >>> -----Original Message-----
>> >>> From: Kasper Sørensen [mailto:i.am.kasper.sorensen@gmail.com]
>> >>> Sent: Wednesday, August 14, 2013 10:18 AM
>> >>> To: dev@metamodel.incubator.apache.org
>> >>> Subject: Re: [DISCUSS] use folder name as schema name for file 
>> >>> based DataContexts
>> >>>
>> >>> Rats, made a mistake in that diff. The Gist has been updated [1] 
>> >>> and
>> now contains the ResourceUtils class which was missing before.
>> >>> [1] https://gist.github.com/kaspersorensen/6210970
>> >>>
>> >>> 2013/8/12 Kasper Sørensen <i.am.kasper.sorensen@gmail.com>:
>> >>>> Here's a proposed patch (implemented for CSV and fixedwidth 
>> >>>> files which are the modules that implemented the old schema naming
pattern):
>> >>>> https://gist.github.com/kaspersorensen/6210970
>> >>>>
>> >>>> 2013/8/10 Kasper Sørensen <i.am.kasper.sorensen@gmail.com>:
>> >>>>> https://issues.apache.org/jira/browse/METAMODEL-4
>> >>>>>
>> >>>>> 2013/8/10 Henry Saputra <henry.saputra@gmail.com>:
>> >>>>>> What is the JIRA for this one?
>> >>>>>>
>> >>>>>>
>> >>>>>> On Fri, Aug 9, 2013 at 2:26 AM, Manuel van den Berg <

>> >>>>>> Manuel.vandenBerg@humaninference.com> wrote:
>> >>>>>>
>> >>>>>>> +1
>> >>>>>>>
>> >>>>>>> (shouldn't I just vote on the Jira for this?)
>> >>>>>>>
>> >>>>>>> manuel
>> >>>>>>>
>> >>>>>>> > -----Original Message-----
>> >>>>>>> > From: Kasper Sørensen 
>> >>>>>>> > [mailto:i.am.kasper.sorensen@gmail.com]
>> >>>>>>> > Sent: Friday, August 09, 2013 9:03
>> >>>>>>> > To: dev@metamodel.incubator.apache.org
>> >>>>>>> > Subject: Re: [DISCUSS] use folder name as schema
name for 
>> >>>>>>> > file based DataContexts
>> >>>>>>> >
>> >>>>>>> > Allow me to bump this issue (it's my impression
that more 
>> >>>>>>> > people have
>> >>>>>>> joined
>> >>>>>>> > in a bit late, after this topic was posted).
>> >>>>>>> >
>> >>>>>>> > I think this is one of the more important issues
that I 
>> >>>>>>> > would want to fix before we make our first release
at Apache.
>> >>>>>>> >
>> >>>>>>> > 2013/7/24 Kasper Sørensen <i.am.kasper.sorensen@gmail.com>:
>> >>>>>>> > > Right now we have this slightly odd naming
convention for 
>> >>>>>>> > > schema and table names when building metadata
for e.g. a 
>> >>>>>>> > > CSV file or a fixed width value file.
>> >>>>>>> > >
>> >>>>>>> > > Schema name: The filename, including file
extension.
>> >>>>>>> > > Table name: The filename without extension.
>> >>>>>>> > > Resulting in e.g. a column path like this:
>> >>>>>>> > > people.csv.people.name
>> >>>>>>> > >
>> >>>>>>> > > I suggest we change it to this convention:
>> >>>>>>> > >
>> >>>>>>> > > Schema name: Folder name
>> >>>>>>> > > Table name: The filename, including file extension.
>> >>>>>>> > > Resulting in e.g. a column path like this:
>> >>>>>>> > > documents.people.csv.name
>> >>>>>>> > >
>> >>>>>>> > > Why do I think this would be an improvement?
>> >>>>>>> > >
>> >>>>>>> > > 1) Because this would first of all make a
kind of sense 
>> >>>>>>> > > to the user to see the file system's hierarchy
reflected 
>> >>>>>>> > > in the
>> schema model.
>> >>>>>>> > > 2) Because it allows us to make these DataContext's

>> >>>>>>> > > operate not on a single file, but on a directory
of 
>> >>>>>>> > > files. I have seen this quite a number of
times by now 
>> >>>>>>> > > that users of MetaModel,
>> or users of e.g.
>> >>>>>>> > > DataCleaner, which uses MetaModel quite heavily,
wants to 
>> >>>>>>> > > do this sort
>> >>>>>>> of
>> >>>>>>> > stuff.
>> >>>>>>> > > 3) The removing of the file extension stuff
is kind of 
>> >>>>>>> > > broken and a strange convention in the first
place.
>> >>>>>>> > >
>> >>>>>>> > > While this doesn't really break backwards
compatibility 
>> >>>>>>> > > in terms of Java code, it would break configuration
files 
>> >>>>>>> > > and other stuff of applications that use MetaModel.
But I 
>> >>>>>>> > > do believe that can be communicated and handled
through 
>> >>>>>>> > > carefully explaining the new convention on
the migration 
>> >>>>>>> > > page (that I
>> recently started writing [1]).
>> >>>>>>> > >
>> >>>>>>> > > What do you think?
>> >>>>>>> > >
>> >>>>>>> > > [1]
>> >>>>>>> > > http://wiki.apache.org/metamodel/MigratingFromEobjectsMet
>> >>>>>>> > > aMode
>> >>>>>>> > > l
>> >>>>>>>
>>

Mime
View raw message