accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Medinets <david.medin...@gmail.com>
Subject Re: "NOT" operator in visibility string
Date Wed, 19 Mar 2014 18:09:37 GMT
Is data shared between sandboxes? Could namespaces proxy for sandboxes?


On Wed, Mar 19, 2014 at 1:46 PM, Mike Drob <madrob@cloudera.com> wrote:

> Thanks, that's really helpful. Couple more questions.
>
> Is a sandbox the same thing as a workspace? Can the terms be used
> interchangeably? Just want to make sure I'm not misinterpreting your
> answers.
>
> Is it fair to describe each sandbox as a separate index table for the
> global data set? And then when users do deletes, it is only reflected in
> the index fields, right?
> But you can't just delete values from the index because you need to keep
> track of the changes in case the user decides to delete globally (after
> appropriate authorization checks, etc...)
>
> Because the visibility is part of the key, changing it involves re-writing
> the data. Which might be just an index record in your case. However, this
> is generally an expensive operation.
>
> I think I need to think on this use case some more, it's definitely
> interesting and not something I had considered before.
>
>
>
> On Wed, Mar 19, 2014 at 1:24 PM, Jeff Kunkle <kunklejr@gmail.com> wrote:
>
>> You have a large amount of data, that is generally readable by all users.
>>
>> Not necessarily. All data has some visibility constraint that a users
>> authorization's may or may not satisfy.
>>
>> Users create their own sandbox, from which they can later exclude
>> portions of the global data set.
>>
>> Yes, users create their own sandboxes which are populated with global
>> data. They may decide to delete some of that data and the change needs to
>> be scoped to their sandbox until the change is published globally.
>>
>
>> User can share their sandbox with others, so really we are talking about
>> sandbox permissions and not so much user permissions.
>>
>> Yes, users can share their sandbox with others, but a sandbox is just a
>> collection of pointers to data. Users sharing a workspace may not
>> necessarily see all of the same data depending on their authorizations.
>>
>> Sandboxes are created often. Or, at least much more often than the data
>> changes.
>>
>> Yes, sandboxes are created often. The data is likely to be ingested more
>> frequently than sandboxes will be created.
>>
>> Do users typically remove large amounts of data from their sandbox? 1%?
>> 10%? 99%?
>>
>> I don't have good numbers to share here.
>>
>> Assuming data is removed via rules, are the rules applied automatically
>> to new data under ingest?
>>
>> I would say no, although I'm not positive I understand the question.
>> Users are not removing data from their sandbox per se, but they may delete
>> data that should then be hidden from their workspace. The data is not
>> really deleted though and is still visible to other users in other
>> sandboxes. Only when the deletion is published does it get deleted for
>> everyone.
>>
>> On Mar 19, 2014, at 1:03 PM, Mike Drob <madrob@cloudera.com> wrote:
>>
>> Wait, I'm really confused by what you are describing, Jeff. Sorry if
>> these are obvious questions, but can you help me get a better grasp of your
>> use case?
>>
>> You have a large amount of data, that is generally readable by all users.
>> Users create their own sandbox, from which they can later exclude
>> portions of the global data set.
>> User can share their sandbox with others, so really we are talking about
>> sandbox permissions and not so much user permissions.
>> Sandboxes are created often. Or, at least much more often than the data
>> changes.
>>
>> Are those all accurate statements? If so, can you clarify the following
>> points:
>>
>> Do users typically remove large amounts of data from their sandbox? 1%?
>> 10%? 99%?
>> Assuming data is removed via rules, are the rules applied automatically
>> to new data under ingest?
>>
>> Thanks,
>> Mike
>>
>>
>> On Wed, Mar 19, 2014 at 12:54 PM, Jeff Kunkle <kunklejr@gmail.com> wrote:
>>
>>> Hi John,
>>>
>>> Yes it's accurate that the system controls the label and who is
>>> associated with it; there are no Accumulo-internal user accounts. But I
>>> don't think it's feasible to remove a sandbox label from something that
>>> should be hidden. Such a scenario would imply that all data is "tagged"
>>> with the labels of every sandbox that is allowed to see the data, which
>>> would be most. It would also imply that the creation of a new sandbox would
>>> necessitate changing the visibility of everything in Accumulo to include
>>> the new sandbox label, effectively rewriting the entire database. Sanboxes
>>> are created and deleted all the time in our application, so it doesn't seem
>>> like a feasible solution to me.
>>>
>>> -Jeff
>>>
>>> On Mar 19, 2014, at 12:16 PM, Josh Elser <josh.elser@gmail.com> wrote:
>>>
>>> > It kind of sounds like you could manage this much easier by
>>> controlling the authorizations a user gets (notably the workspace name) and
>>> the grant/revoke above the Accumulo level.
>>> >
>>> > A sandbox has a unique label and the external system controls which
>>> users are granted that label. This way, each sandbox can be modified
>>> individually (using authorizations that contain the data visibility and the
>>> sandbox label) or the original data set could be modified (by omitting a
>>> sandbox label in the authorizations used).
>>> >
>>> > Is that accurate?
>>> >
>>> > On 3/19/14, 12:05 PM, Jeff Kunkle wrote:
>>> >> I attempted to simplify the scenario to facilitate discussion, which
>>> on
>>> >> second thought may have been a mistake. Here's the whole scenario:
>>> >>
>>> >> Different users have access to different subsets of the data depending
>>> >> on their authorizations and the visibility of the data. Users "work
>>> >> with" the data in what we call a sandbox. Sanboxes can be shared with
>>> >> other users (this is the group creation I was talking about earlier).
>>> >> Deletes to the data would be "scoped" to the sandbox by changing the
>>> >> visibility to add "& !workspace_name" so that people viewing the
>>> >> workspace wouldn't see the data but everyone else would.
>>> >>
>>> >> On Mar 19, 2014, at 11:48 AM, Sean Busbey <busbey+lists@cloudera.com
>>> >> <mailto:busbey+lists@cloudera.com>> wrote:
>>> >>
>>> >>> On Wed, Mar 19, 2014 at 10:43 AM, Jeff Kunkle <kunklejr@gmail.com
>>> >>> <mailto:kunklejr@gmail.com>> wrote:
>>> >>>
>>> >>>    New groups are created on the fly by our application when needed.
>>> >>>    Under the scenario you describe we'd have to go through all the
>>> >>>    data in Accumulo whenever a group is created so that users in
the
>>> >>>    group can see the existing data.
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> Ah! So your use case is that all data defaults to world readable
and
>>> >>> then users have the option of opting out of seeing subsets. Right?
>>> >>>
>>> >>> In your scenario user groups also get to opt-out of seeing data
on
>>> the
>>> >>> fly, yes? Both require rewriting the data. Does the group creation
>>> >>> happen more often?
>>> >>
>>>
>>>
>>
>>
>

Mime
View raw message