asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Carey <dtab...@gmail.com>
Subject Re: Yet another external data change proposal
Date Sat, 29 Apr 2017 15:23:47 GMT
Sounds like a good normalization of the cross-product of parsers and 
data sources.  One nit:  Why "generic" as the name?  (Silly detail, I 
know; just seems a little too, um, generic.)  Could you list out what 
the supported set of feeds will be - the full set - including the 
semi-supported ones (Condor, Twitter push, Twitter pull) - and show what 
the CREATE FEED statements will be for all of this? I'm not sure the 
community has a clear picture of what we currently have.  I vote to take 
this opportunity to really clean this up and then advertise improved 
feed support on the bill of materials for the next possible release!


On 4/28/17 10:17 AM, abdullah alamoudi wrote:
> Hi Devs,
> Here is a bit of history. When external data access was introduced to asterixdb, we had
so many adapters. Each adapter was a self contained piece in charge of fetching and parsing
data. Each adapter had an alias (hdfs, localfs, twitter, socket, etc)
> This lead to a lot of duplicate code and to remove duplication, we created a generic
adapter which consists of a pluggable data source and a pluggable data parser. we replaced
all of those old adapters with a data source that can be plugged into the generic adapter.
>
> We lost the adapters and their aliases, so a statement like using hdfs(....) would fail
because the hdfs adapter is not there anymore. We didn't want to change the syntax and wanted
it to keep working. So in such a case, if the adapter was not found, we would use the generic
adapter
> and assume the hdfs is the data source parameter. In that sense, the adapter name became
a parameter outside the pairs of key, value list of parameters.
>
> This was fine for a while but as external data evolves and as we attempt to make the
codebase cleaner and more maintainable, we are having to deal with more nuances working around
this compatibility issue.
> We would like to propose a change that moves the datasource parameter inside the key
value pair. For example:
>
> using hdfs(...) would become using generic("datasource"="hdfs")
> using localfs(...) would become using generic("datasource"="localfs")
>
> This would allow us to have a cleaner code under the hood. we would update the test cases
and the documentation. If anybody has an objection or a thought, then let us know.
>
> Cheers,
> Abdullah.


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message