drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Parth Chandra <par...@apache.org>
Subject Re: dot drill file
Date Fri, 06 Nov 2015 21:13:09 GMT
Hi Julien,

  In an earlier discussion, regarding 'insert into' we had discussed the
idea of keeping a merged schema (a common schema that applies to all the
files in the directory) in a .drill file.  The metadata cache file also has
the same information and, in addition, has stats.  We never did specify
what a merged schema contains.

  My understanding was that the .drill file, when available, becomes the
source of schema information. I can see both the metadata cache and the
insert into functionality using a common format. For these two sets of
functionality, I don't see a need for the file to be human readable and if
a more efficient format is available, I think we should use that. This is
particularly true if we need to keep per file information.

  Is that how we are thinking of the .drill file? Or are we talking about a
.drill.format (?) file. I guess this is similar to Ted's question.

  BTW, I'm not convinced that record level error handling directives belong
in this. I know Jacques had some thoughts about that, but I wouldn't mind
if someone explained it to me again :) . To me record level error handling
is really a query level directive, not something that applies to all the
data (in a directory) all the time. Keeping an open mind on this though.

  Something about the inheritance rules based on similar questions
regarding the metadata cache file - The metadata cache file is built based
on all the files in the hierarchy under the current directory. So if you
have a hierarchy
   -- B
      -- C
   -- D
there is a metadata cache file in A, B, C and D. The cache file in A
contains info on all the files in B, C and D. If you update the directory C
and refresh metadata for C, then _only_ C will get updated and the changes
are not propagated upwards. If you refresh metadata for A, all the changes
are seen by A, B, and C. For the use case you're outlining, I would think
looking only at the directory the files are in should suffice.


On Sun, Nov 1, 2015 at 10:18 PM, Julien Le Dem <julien@dremio.com> wrote:

> Hello,
> I'd like to capture the requirement for dot drill files.
> Here is my understanding:
> A ".drill" file is in JSON format and is a mechanism provided by the
> FileSystemPlugin to define the format plugin to use collocated with the
> files containing the data in a file system. It will override any extension
> or magic number header mapping.
> It will enable configuring the format plugin and record level error
> handling mechanism (bad record skipping, etc). It could be extended to
> support more in the future.
> Is this correct? Are there inheritance rules if more than one file is found
> in the hierarchy? Does drill look only at the dir containing the files or
> also all parent directories?
> --
> Julien

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message