drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Neeraja Rentachintala <nrentachint...@maprtech.com>
Subject Re: [DISCUSS] Drop table support
Date Wed, 05 Aug 2015 20:49:28 GMT
Another question/comment.

Does Drill need to manage concurrency for the Drop table i.e how do you
deal with users trying to read the data while somebody is dropping. Does it
need to implement some kind of locking.

I have some thoughts on that but would like to know others think - Drill is
not (yet) a transactional system but rather an interactive query layer on
variety of stores. The couple of most common use cases I can think of in
this context  are - a user doing analytics/exploration and as part of it he
would create some intermediate tables, insert data into them and drop the
tables or BI tools generating these intermediate tables for processing
queries. Both these do not have the concurrency issue..
Additionally given that the data is externally managed, there could always
be other processes adding and deleting files and Drill doesn't even have
control over them.
Overall, I think the first phase of DROP implementation might be ok not to
have these locking/concurrency checks.

Thoughts?

-Neeraja





On Wed, Aug 5, 2015 at 11:54 AM, Mehant Baid <baid.mehant@gmail.com> wrote:

> What you are suggesting makes sense in the case when security is enabled.
> So when Drill is accessing the file system it will impersonate the user who
> issued the command and drop will happen if the user has sufficient
> permissions.
>
> However when security isn't enabled, Drill will be accessing the file
> system as the Drill user itself which is most likely to be a super user who
> has permissions to delete most files. To prevent any catastrophic drops
> checking for homogenous file formats makes sure that at least the directory
> being dropped is something that can be read by Drill. This will prevent any
> accidental drops (like dropping the home directory etc, because its likely
> to have file formats that cannot be read by Drill). This will not prevent
> against malicious behavior (for handling this security should be enabled).
>
> Thanks
> Mehant
>
> On 8/5/15 11:43 AM, Ted Dunning wrote:
>
>> Is any check really necessary?
>>
>> Can't we just say that for data sources that are file-like that drop is a
>> rough synonym for rm? If you have permission to remove files and
>> directories, you can do it.  If you don't, it will fail, possibly half
>> done. I have never seen a bug filed against rm to add more elaborate
>> semantics, so why is it so necessary for Drill to have elaborate semantics
>> here?
>>
>>
>>
>> On Wed, Aug 5, 2015 at 11:09 AM, Ramana I N <inramana@gmail.com> wrote:
>>
>> The homogenous check- Will it be just checking for types are homogenous or
>>> if they are actually types that can be read by drill?
>>> Also, is there a good way to determine if a file can be read by drill?
>>> And
>>> will there be a perf hit if there are large number of files?
>>>
>>> Regards
>>> Ramana
>>>
>>>
>>> On Wed, Aug 5, 2015 at 11:03 AM, Mehant Baid <baid.mehant@gmail.com>
>>> wrote:
>>>
>>> I agree, it is definitely restrictive. We can lift the restriction for
>>>> being able to drop a table (when security is off) only if the Drill user
>>>> owns it. I think the check for homogenous files should give us enough
>>>> confidence that we are not deleting a non Drill directory.
>>>>
>>>> Thanks
>>>> Mehant
>>>>
>>>>
>>>> On 8/4/15 10:00 PM, Neeraja Rentachintala wrote:
>>>>
>>>> Ted, thats fair point on the recovery part.
>>>>>
>>>>> Regarding the other point by Mehant (copied below) ,there is an
>>>>> implication
>>>>> that user can drop only Drill managed tables (i.e created as Drill
>>>>> user)
>>>>> when security is not enabled. I think this check is too restrictive
>>>>>
>>>> (also
>>>
>>>> unintuitive). Drill doesn't have the concept of external/managed tables
>>>>> and
>>>>> a user (impersonated user if security is enabled or Drillbit service
>>>>>
>>>> user
>>>
>>>> if no security is enabled) should be able to drop the table if they have
>>>>> permissions to do so. The above design proposes a check to verify if
>>>>> the
>>>>> files that need to be deleted are readable by Drill and I believe is
a
>>>>> good
>>>>> validation to have.
>>>>>
>>>>> /The above check is in the case when security is not enabled. Meaning
>>>>> we
>>>>> are executing as the Drill user. If we are running as the Drill user
>>>>> (which
>>>>> might be root or a super user) its likely that this user has
>>>>> permissions
>>>>> to
>>>>> delete most files and checking for permissions might not suffice. So
>>>>>
>>>> when
>>>
>>>> security isn't enabled the proposal is to delete only those files that
>>>>>
>>>> are
>>>
>>>> owned (created) by the Drill user./
>>>>>
>>>>>
>>>>> On Fri, Jul 31, 2015 at 12:09 AM, Ted Dunning <ted.dunning@gmail.com>
>>>>> wrote:
>>>>>
>>>>> On Thu, Jul 30, 2015 at 4:56 PM, Neeraja Rentachintala <
>>>>>
>>>>>> nrentachintala@maprtech.com> wrote:
>>>>>>
>>>>>> Also will there any mechanism to recover once you accidentally drop?
>>>>>>
>>>>>>> yes.  Snapshots <
>>>>>>> https://www.mapr.com/resources/videos/mapr-snapshots
>>>>>>>
>>>>>> .
>>>>
>>>>> Seriously, recovery of data due to user error is a platform thing.  How
>>>>>> can
>>>>>> we recover from turning off the cluster?  From removing a disk on
an
>>>>>> Oracle
>>>>>> node?
>>>>>>
>>>>>> I don't think that this is Drill's business.
>>>>>>
>>>>>>
>>>>>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message