lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Susheel Kumar <>
Subject Re: Spell Check and Privacy
Date Mon, 12 Oct 2015 14:36:28 GMT
Hi Arnon,

I couldn't fully understood your use case regarding Privacy. Are you
concerned that SpellCheck may reveal user names part of suggestions which
could have belonged to different organizations / ACLS OR after providing
suggestions you are concerned that user may be able to click and view other
organization users?

Please provide some details on your concern for Privacy with Spell Checker.


On Mon, Oct 12, 2015 at 9:45 AM, Dyer, James <>

> Arnon,
> Use "spellcheck.collate=true" with "spellcheck.maxCollationTries" set to a
> non-zero value.  This will give you re-written queries that are guaranteed
> to return hits, given the original query and filters.  If you are using an
> "mm" value other than 100%, you also will want specify "
>". (or if using "q.op=OR", then use
> "spellcheck.collateParam.q.op=AND")
> Of course, the first section of the spellcheck result will still show
> every possible suggestion, so your client needs to discard these and not
> divulge them to the user.  If you need to know word-by-word how the
> collations were constructed, then specify
> "spellcheck.collateExtendedResults=true".  Use the extended collation
> results for this information and not the first section of the spellcheck
> results.
> This is all fairly well-documented on the old solr wiki:
> James Dyer
> Ingram Content Group
> -----Original Message-----
> From: Arnon Yogev []
> Sent: Monday, October 12, 2015 2:33 AM
> To:
> Subject: Spell Check and Privacy
> Hi,
> Our system supports many users from different organizations and with
> different ACLs.
> We consider adding a spell check ("did you mean") functionality using
> DirectSolrSpellChecker. However, a privacy concern was raised, as this
> might lead to private information being revealed between users via the
> suggested terms. Using the FileBasedSpellChecker is another option, but
> naturally a static list of terms is not optimal.
> Is there a best practice or a suggested method for these kind of cases?
> Thanks,
> Arnon

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message