lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jamie Johnson <jej2...@gmail.com>
Subject Re: Using payloads and user provided data in score
Date Thu, 23 Jul 2015 14:30:16 GMT
Sorry for being vague, I'll try to explain more.  In my use case a
particular field does not have a security control, it's the data in the
field.  So for instance if I had a schema with a field called name, there
could be data that should be secured at A, B, A&B, A|B, etc within that
field.  So again it's not the field that has this control it's the data in
the field.  My thought based on your suggestion was to dynamically generate
the fields based on the authorizations, this way the user would only see
name, but it would get translated to the fields in the index that they can
see.  So at index time if a field was added to the solr document that said
name:foo with authorizations A&B I would need to translate that to
name_A&B_txt:foo.  Then subsequently on search I would check what fields in
the index the user should be able to see and rewrite queries that said
name:foo to name_A&B_txt:foo (assuming the user can see A&B).

We do not explicitly control the fields the user or calling application has
access to because I don't want to expose the name_A&B_txt:foo fields to
calling applications, they know that a field "name" exists, based on that I
need to translate a name:foo query into the appropriately controlled
version.  Does that make sense?

My biggest concern with this (beyond the query rewrite) is how it will
impact scoring (especially in the case information is available with
multiple markings, i.e. name_A_txt has a value of foo and name_B_txt has a
value of foo and the user has authorizations A and B) and possibly bumping
up against the maximum clause limit as we expand the query.

These reasons were why I thought it best to use payloads to make terms with
authorizations a user can't see not impact the score and then resolve the
actual object the user can see using a store that already supports this
type of access pattern (specifically Accumulo in this case).

Your "ugly problem" is my situation I think ;)

On Thu, Jul 23, 2015 at 12:06 AM, Erick Erickson <erickerickson@gmail.com>
wrote:

> I'm not quite getting it here. I'm guessing that you do not
> allow fielded queries or you strictly control the fields a user
> sees to pick from. Otherwise your security stuff goes out the
> window, say you have a drop-down list of fields to choose from
> or something.
>
> Assuming you do NOT have such a thing, the user is just typing
> words in a box, then you have to figure out, once at the
> app layer, what fields they have access to and just append a
> qf=field_secure1,field_secure2.....
> parameter to the query.
>
> That's it. You do not have to rewrite the user query at all, the q
> parameter is just passed through as is.
>
> bq:  I guess in a search component I could look up all of the fields
> that are in the index and only run queries against fields they should be
> able to see once I know what is in the index (this is what you're
> suggesting right?).
>
> Kind of, except not in a search component. You have to have modeled
> the access rights somewhere, so I'm not getting why you can't just use
> that model to generate the list of restricted fields the user has access
> to.
> You haven't explained that model other than to say it's "complex". So I
> have no clue whether you're talking about not _knowing_ what fields are
> in the docs in the first place (quite possible with dynamic fields) or
> whether you do know the complete field list but calculating the user's
> access
> rights to which fields is complex.
>
> But I should emphasize again that my assumption is that once calculated,
> this list is invariant so it does not need to be done for every request.
> Indeed,
> what I'm envisioning is not writing any Solr code at all, all done in
> the app layer.
>
> As far as extra work, there isn't any as far as Solr is concerned.
> It's exactly as though you were specifying this in, say, the request
> handler. So I don't get your concern about lots and lots of fields.
> Now, I'm assuming a simple document model with some number
> of fields. The access rights to which of those fields a user can
> see may be a complex calculation, but again you only need to do it
> once. For that matter, you could pre-calculate that set of fields
> or otherwise cache it.
>
> Now, this breaks down if the document model isn't that simple,
> say the same field in doc1 can be seen by userX, but userX
> can't see the _same_ field in doc2. That's an ugly problem...
>
> And let's further say there are a number of fields that _everyone_
> can see. They can be placed in an <appends> section of the request
> handler so you don't have to specify them for each request.
>
> Best,
> Erick
>
> On Wed, Jul 22, 2015 at 4:12 PM, Jamie Johnson <jej2003@gmail.com> wrote:
> > Looks like this may be what I'm looking for
> >
> > *SolrRequestInfo*
> >
> > I have not tried this yet but looks promising.
> >
> > Assuming this works, thinking about your suggestion I would need to
> rewrite
> > the users query with the appropriate fields, are there any utilities for
> > doing this?  I'd be looking to rewrite a fielded query like +field:value
> > possibly to something like +(field.secure:value field.secure2:value)
> >
> > Again thanks for suggestions
> > On Jul 22, 2015 5:20 PM, "Jamie Johnson" <jej2003@gmail.com> wrote:
> >
> >> I answered my own question, looks like the field infos are always read
> >> within the IndexSearcher so that cost is already being paid.
> >>
> >> I would potentially have to duplicate information in multiple fields if
> it
> >> was present at multiple authorization levels, is there a limit to the
> >> number of fields within a document?  I'm also concerned this might skew
> my
> >> search results as terms that had more authorizations would appear in
> more
> >> fields and would result in more matches on query.  I'll play with this a
> >> little but I am still wondering about my original question.
> >>
> >> On Wed, Jul 22, 2015 at 4:45 PM, Jamie Johnson <jej2003@gmail.com>
> wrote:
> >>
> >>> I had thought about this in the past, but thought it might be too
> >>> expensive.  I guess in a search component I could look up all of the
> fields
> >>> that are in the index and only run queries against fields they should
> be
> >>> able to see once I know what is in the index (this is what you're
> >>> suggesting right?).
> >>>
> >>> My concern would be that the number of fields per document would grow
> too
> >>> large to support this.  Our controls aren't simple like user or admin
> they
> >>> are complex combinations of authorizations so I would think there
> might be
> >>> a large number of fields that are generated using this approach.  Would
> >>> retrieving all field infos from Solr be expensive on each request to
> see
> >>> what they should be able to query?
> >>>
> >>> On Wed, Jul 22, 2015 at 4:19 PM, Erick Erickson <
> erickerickson@gmail.com>
> >>> wrote:
> >>>
> >>>> Why don't you handle it all at the app level? Here's what I mean:
> >>>>
> >>>> I'm assuming that you're using edismax here, but the same principle
> >>>> applies if not.
> >>>>
> >>>> Your handler (say the "/select" handler) has a "qf" parameter which
> >>>> defines
> >>>> the fields that are searched over in the absence of a field qualifier,
> >>>> e.g.
> >>>> q=whatever&qf=title,description
> >>>>
> >>>> causes the search term to be looked for in the two fields "title" and
> >>>> "description"
> >>>> You can also set up the qf fields in the "/select" handler as one of
> >>>> the items in
> >>>> the <defaults> section....
> >>>>
> >>>> But, the qf param in the <defaults> section is just that... a
default.
> >>>> So individual
> >>>> queries can override it. What I have in mind is that you'd look up the
> >>>> user's
> >>>> field-access list and append that list as necessary to the query and
> >>>> just pass it
> >>>> on through.
> >>>>
> >>>> Things to watch out for:
> >>>> 1> if the user specifies a field, you'll have to strip that off if
> >>>> they don't have rights,
> >>>> i.e. q=field1:whatever whenever
> >>>> ignores the qf parameter for "whatever" but does respect the qf param
> >>>> for "whenever".
> >>>> 2> If you have some kind of date field say that you want to facet
> >>>> over, you'd have
> >>>> to control that.
> >>>> 3> if you have a "bag of words" where you use copyField to add a
bunch
> >>>> of field's
> >>>> data to an uber-field then the user can infer some things from that
> >>>> info, so you probably
> >>>> don't want to be careful about what copyFields you use.
> >>>>
> >>>> Best,
> >>>> Erick
> >>>>
> >>>> On Wed, Jul 22, 2015 at 12:21 PM, Jamie Johnson <jej2003@gmail.com>
> >>>> wrote:
> >>>> > I am looking for a way to prevent fields that users shouldn't be
> able
> >>>> to
> >>>> > know exist from contributing to the score.  The goal is to provide
a
> >>>> way to
> >>>> > essentially hide certain fields from requests based on an access
> level
> >>>> > provided on the query.  I have managed to make terms that users
> >>>> shouldn't
> >>>> > be able to see not impact the score by implementing a custom
> Similarity
> >>>> > class that looks at the terms payloads and returns 0 for the score
> if
> >>>> they
> >>>> > shouldn't know the field exists.  The issue however is that I don't
> >>>> have
> >>>> > access to the request at this point so getting the users access
> level
> >>>> is
> >>>> > proving problematic.  Is there a way to get the current request
> that is
> >>>> > being processed via some thread local variable or something similar
> >>>> that
> >>>> > Solr maintains?  If not is there another approach that I could
be
> >>>> using to
> >>>> > access information from the request within my Similarity
> >>>> implementation?
> >>>> > Any thoughts on this would be greatly appreciated.
> >>>> >
> >>>> > -Jamie
> >>>>
> >>>
> >>>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message