incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Williams (JIRA)" <>
Subject [jira] [Commented] (BLUR-397) Improve data loading from M/R
Date Mon, 15 Dec 2014 20:53:13 GMT


Tim Williams commented on BLUR-397:

That break would break existing deployments that may use multiple Blur users (one for each
cluster) and rely on group permissions to not care which controller they talk to.  I suspect
that's not a typical deployment though.  

If we do care about that, we could keep the permissions you've proposed by altering the controller
createTable to delegate to one shard server in the cluster (so that paths get set up with
the proper ownership for it's cluster).  

Out of curiosity, what's the rationale for closing off the shards, but leaving open the types?

> Improve data loading from M/R
> -----------------------------
>                 Key: BLUR-397
>                 URL:
>             Project: Apache Blur
>          Issue Type: Improvement
>          Components: Blur, Blur MapReduce
>            Reporter: Tim Williams
> There's an awkward permissions dilemma when writing data into Blur from Map/Reduce. 

> A job would typically create a table, then load the data.  The challenge is that the
table itself is created through the controller, which means it's written to DFS as the user
actually running the controller daemon - typically 'blur'.  The Map/Reduce job may be run
as some other user totally, but it may be a user that you don't want to have write access
inside blur's directory paths. In other words, you'd like arbitrary user(s) to be able to
create/populate table data without necessarily having write access to blur's internal stuffs.
> One approach is to have the user's job write to any location they have access to, the
"tell" Blur to 'import' it - at which time, Blur would literally move the data into it's control.

This message was sent by Atlassian JIRA

View raw message