Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1AD6D1841C for ; Thu, 2 Jul 2015 15:33:45 +0000 (UTC) Received: (qmail 24636 invoked by uid 500); 2 Jul 2015 15:33:41 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 24564 invoked by uid 500); 2 Jul 2015 15:33:40 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 24552 invoked by uid 99); 2 Jul 2015 15:33:40 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Jul 2015 15:33:40 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 2F1731817A8 for ; Thu, 2 Jul 2015 15:33:40 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.542 X-Spam-Level: ** X-Spam-Status: No, score=2.542 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, KAM_INFOUSMEBIZ=0.75, RCVD_IN_MSPIKE_H2=-1.108, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id ROsTopm5aUll for ; Thu, 2 Jul 2015 15:33:33 +0000 (UTC) Received: from mail-ob0-f175.google.com (mail-ob0-f175.google.com [209.85.214.175]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 4BE6E24A29 for ; Thu, 2 Jul 2015 15:33:33 +0000 (UTC) Received: by obbop1 with SMTP id op1so52003180obb.2 for ; Thu, 02 Jul 2015 08:32:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=ZqzFQGhsXM6xrX6/b+yTt/0KnZJrD0CBCNwk5moll+0=; b=cMueGYxhHd95A+PWgZ7EnYC68PcrLJpwD5wc48kJKo3LKUQekpUqIE58uGrkuV/fhJ oFOyPLUfbphIAUj0LipKwi7M5Ayecs7CvZisyPk43wk72wt+VPH7q8FARqurwFKYwdft YgXtYlIYAqbVaj0KIyTzJvYxMKSFZowp+dxxFza8gMZlxD6MhxD0VZ/yLy4QvODJf8J9 +FvpaoHOs2qa7A6E1WtDtt8UjZnLvTEOWxFhxMaUYU0moq0TNIeok9GkDk7E9GURGzv1 ayosRNh8HR7Qk3RR/GNtjCNtuWuklaoDzvwMSZixR3VjW/+banDlyld9skrjJuUnI57N Deaw== MIME-Version: 1.0 X-Received: by 10.182.236.7 with SMTP id uq7mr26445240obc.42.1435851167604; Thu, 02 Jul 2015 08:32:47 -0700 (PDT) Received: by 10.202.232.206 with HTTP; Thu, 2 Jul 2015 08:32:47 -0700 (PDT) In-Reply-To: References: Date: Thu, 2 Jul 2015 12:32:47 -0300 Message-ID: Subject: Re: Suggester duplicating values From: Rafael To: solr-user@lucene.apache.org Content-Type: multipart/alternative; boundary=001a11c2f912cba5960519e6266a --001a11c2f912cba5960519e6266a Content-Type: text/plain; charset=UTF-8 Just double checking: In my ruby backend I ask for (using the given example) all suggested terms that starts with "J." , then I (probably) add all the terms to a Set, and then return the Set to the view. Right ? []'s Rafael On Thu, Jul 2, 2015 at 12:12 PM, Alessandro Benedetti < benedetti.alex85@gmail.com> wrote: > No, I was referring to the fact that a Suggester as a unit of information > manages simple terms which are identified simply by themselves. > > What you need to do is tu sums some Ruby Datastructure that prevent the > duplicates to be inserted, and then offer the Suggestions from there. > > Cheers > > 2015-07-02 15:42 GMT+01:00 Rafael : > > > Thanks, Alessandro! > > > > Well, I'm using Ruby and the r-solr as a client library. I didn't get > what > > you said about term id. Do I have to create this field ? Or is it a > "hidden > > field" utilized by solr under the hood ? > > > > []'s > > Rafael > > > > On Thu, Jul 2, 2015 at 6:41 AM, Alessandro Benedetti < > > benedetti.alex85@gmail.com> wrote: > > > > > Hi Rafael, > > > Your problem is clear and it has actually been explored few times in > the > > > past. > > > I agree with you in a first instance. > > > > > > A Suggester basic unit of information is a term. Not a document. > > > This means that actually it does not make a lot of sense to return > > > duplicates terms ( because they are coming from different docs). > > > The term id should be the term itself as there is no way for a human to > > > perceive any difference between two different terms returned by the > > > Suggester. > > > > > > So, this consideration apart, are you using an intermediate API to > query > > > Solr ( you should definitely do) . > > > If you are using any client, your client language should provide you a > > data > > > structure implementation to use to avoid duplicates. > > > Java for example is giving you HashSet , TreeSet and all the related > > > classes. > > > > > > Hope this helps, > > > > > > Cheers > > > > > > 2015-07-01 18:40 GMT+01:00 Rafael : > > > > > > > Hi, I'm building a autocomplete solution on top of Solr for an ebook > > > > seller, but my database is complete denormalized, for example, I have > > > this > > > > kind of records: > > > > > > > > *author | title | price* > > > > -----------------+-----------------------------+--------- > > > > J. R. R. Tolkien | Lord of the Rings | $10.0 > > > > J. R. R. Tolkien | Lord of the Rings Vol. 3 | $12.0 > > > > J. R. R. Tolkien | Lord of the Rings | $11.0 > > > > J. R. R. Tolkien | Lord of the Rings Vol. 3 | $7.5 > > > > J. R. R. Tolkien | Lord of the Rings Hardcover | $30.5 > > > > > > > > ****We are already spending effort to normalize the database, but it > > will > > > > take a while* > > > > > > > > > > > > Thus, when I try to implement a suggest on author field, for example, > > if > > > I > > > > type "*J.*" I'd get "*J. R. R. Tolkien*" 4 times. > > > > > > > > My Suggester Configuration is pretty standard: > > > > > > > > > > > > > > > positionIncrementGap="100"> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > mySuggester > > > > AnalyzingInfixLookupFactory > > > > DocumentDictionaryFactory > > > > author > > > > textSuggest > > > > > > > > > > > > > > > > > > > startup="lazy"> > > > > > > > > true > > > > 20 > > > > mySuggester > > > > > > > > > > > > suggest > > > > > > > > > > > > > > > > > > > > And I'm using Solr 5.2.1. > > > > > > > > *Question:* Is there a way to get only unique values for suggestion ? > > Or, > > > > would be simpler to export a file (or even a nem table in database) > > > without > > > > duplicated values ? > > > > > > > > Thanks. > > > > > > > > > > > > > > > > -- > > > -------------------------- > > > > > > Benedetti Alessandro > > > Visiting card : http://about.me/alessandro_benedetti > > > > > > "Tyger, tyger burning bright > > > In the forests of the night, > > > What immortal hand or eye > > > Could frame thy fearful symmetry?" > > > > > > William Blake - Songs of Experience -1794 England > > > > > > > > > -- > -------------------------- > > Benedetti Alessandro > Visiting card : http://about.me/alessandro_benedetti > > "Tyger, tyger burning bright > In the forests of the night, > What immortal hand or eye > Could frame thy fearful symmetry?" > > William Blake - Songs of Experience -1794 England > --001a11c2f912cba5960519e6266a--