asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From abdullah alamoudi <bamou...@gmail.com>
Subject Yet another external data change proposal
Date Fri, 28 Apr 2017 17:17:54 GMT
Hi Devs,
Here is a bit of history. When external data access was introduced to asterixdb, we had so
many adapters. Each adapter was a self contained piece in charge of fetching and parsing data.
Each adapter had an alias (hdfs, localfs, twitter, socket, etc)
This lead to a lot of duplicate code and to remove duplication, we created a generic adapter
which consists of a pluggable data source and a pluggable data parser. we replaced all of
those old adapters with a data source that can be plugged into the generic adapter.

We lost the adapters and their aliases, so a statement like using hdfs(....) would fail because
the hdfs adapter is not there anymore. We didn't want to change the syntax and wanted it to
keep working. So in such a case, if the adapter was not found, we would use the generic adapter
and assume the hdfs is the data source parameter. In that sense, the adapter name became a
parameter outside the pairs of key, value list of parameters.

This was fine for a while but as external data evolves and as we attempt to make the codebase
cleaner and more maintainable, we are having to deal with more nuances working around this
compatibility issue.
We would like to propose a change that moves the datasource parameter inside the key value
pair. For example:

using hdfs(...) would become using generic("datasource"="hdfs")
using localfs(...) would become using generic("datasource"="localfs")

This would allow us to have a cleaner code under the hood. we would update the test cases
and the documentation. If anybody has an objection or a thought, then let us know.

Cheers,
Abdullah.
Mime
View raw message