lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rafael <rafael.man...@gmail.com>
Subject Re: Suggester duplicating values
Date Thu, 02 Jul 2015 15:32:47 GMT
Just double checking:

In my ruby backend I ask for (using the given example) all suggested terms
that starts with "J." , then I (probably) add all the terms to a Set, and
then return the Set to the view. Right ?

[]'s
Rafael

On Thu, Jul 2, 2015 at 12:12 PM, Alessandro Benedetti <
benedetti.alex85@gmail.com> wrote:

> No, I was referring to the fact that a Suggester as a unit of information
> manages simple terms which are identified simply by themselves.
>
> What you need to do is tu sums some Ruby Datastructure that prevent the
> duplicates to be inserted, and then offer the Suggestions from there.
>
> Cheers
>
> 2015-07-02 15:42 GMT+01:00 Rafael <rafael.manoel@gmail.com>:
>
> > Thanks, Alessandro!
> >
> > Well, I'm using Ruby and the r-solr as a client library. I didn't get
> what
> > you said about term id. Do I have to create this field ? Or is it a
> "hidden
> > field" utilized by solr under the hood ?
> >
> > []'s
> > Rafael
> >
> > On Thu, Jul 2, 2015 at 6:41 AM, Alessandro Benedetti <
> > benedetti.alex85@gmail.com> wrote:
> >
> > > Hi Rafael,
> > > Your problem is clear and it has actually been explored few times in
> the
> > > past.
> > > I agree with you in a first instance.
> > >
> > > A Suggester basic unit of information is a term. Not a document.
> > > This means that actually it does not make a lot of sense to return
> > > duplicates terms ( because they are coming from different docs).
> > > The term id should be the term itself as there is no way for a human to
> > > perceive any difference between two different terms returned by the
> > > Suggester.
> > >
> > > So, this consideration apart, are you using an intermediate API to
> query
> > > Solr ( you should definitely do) .
> > > If you are using any client, your client language should provide you a
> > data
> > > structure implementation to use to avoid duplicates.
> > > Java for example is giving you HashSet , TreeSet and all the related
> > > classes.
> > >
> > > Hope this helps,
> > >
> > > Cheers
> > >
> > > 2015-07-01 18:40 GMT+01:00 Rafael <rafael.manoel@gmail.com>:
> > >
> > > > Hi, I'm building a autocomplete solution on top of Solr for an ebook
> > > > seller, but my database is complete denormalized, for example, I have
> > > this
> > > > kind of records:
> > > >
> > > > *author           | title                       | price*
> > > > -----------------+-----------------------------+---------
> > > > J. R. R. Tolkien | Lord of the Rings           | $10.0
> > > > J. R. R. Tolkien | Lord of the Rings Vol. 3    | $12.0
> > > > J. R. R. Tolkien | Lord of the Rings           | $11.0
> > > > J. R. R. Tolkien | Lord of the Rings Vol. 3    | $7.5
> > > > J. R. R. Tolkien | Lord of the Rings Hardcover | $30.5
> > > >
> > > > ****We are already spending effort to normalize the database, but it
> > will
> > > > take a while*
> > > >
> > > >
> > > > Thus, when I try to implement a suggest on author field, for example,
> > if
> > > I
> > > > type "*J.*" I'd get "*J. R. R. Tolkien*" 4 times.
> > > >
> > > > My Suggester Configuration is pretty standard:
> > > >
> > > > <!-- schema -->
> > > >     <fieldType name="textSuggest" class="solr.TextField"
> > > > positionIncrementGap="100">
> > > >       <analyzer type="index">
> > > >         <tokenizer class="solr.KeywordTokenizerFactory"/>
> > > >         <filter class="solr.LowerCaseFilterFactory"/>
> > > >       </analyzer>
> > > >       <analyzer type="query">
> > > >         <tokenizer class="solr.KeywordTokenizerFactory"/>
> > > >         <filter class="solr.LowerCaseFilterFactory"/>
> > > >       </analyzer>
> > > >     </fieldType>
> > > >
> > > >
> > > > <!-- Solrconfig -->
> > > >   <searchComponent name="suggest" class="solr.SuggestComponent">
> > > >         <lst name="suggester">
> > > >       <str name="name">mySuggester</str>
> > > >       <str name="lookupImpl">AnalyzingInfixLookupFactory</str>
> > > >       <str name="dictionaryImpl">DocumentDictionaryFactory</str>
> > > >       <str name="field">author</str>
> > > >       <str name="suggestAnalyzerFieldType">textSuggest</str>
> > > >     </lst>
> > > >   </searchComponent>
> > > >
> > > >   <requestHandler name="/suggest" class="solr.SearchHandler"
> > > > startup="lazy">
> > > >     <lst name="defaults">
> > > >       <str name="suggest">true</str>
> > > >       <str name="suggest.count">20</str>
> > > >       <str name="suggest.dictionary">mySuggester</str>
> > > >     </lst>
> > > >     <arr name="components">
> > > >       <str>suggest</str>
> > > >     </arr>
> > > >   </requestHandler>
> > > >
> > > >
> > > > And I'm using Solr 5.2.1.
> > > >
> > > > *Question:* Is there a way to get only unique values for suggestion ?
> > Or,
> > > > would be simpler to export a file (or even a nem table in database)
> > > without
> > > > duplicated values ?
> > > >
> > > > Thanks.
> > > >
> > >
> > >
> > >
> > > --
> > > --------------------------
> > >
> > > Benedetti Alessandro
> > > Visiting card : http://about.me/alessandro_benedetti
> > >
> > > "Tyger, tyger burning bright
> > > In the forests of the night,
> > > What immortal hand or eye
> > > Could frame thy fearful symmetry?"
> > >
> > > William Blake - Songs of Experience -1794 England
> > >
> >
>
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message