Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EDDD118B49 for ; Sun, 28 Jun 2015 10:49:23 +0000 (UTC) Received: (qmail 29562 invoked by uid 500); 28 Jun 2015 10:49:23 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 29500 invoked by uid 500); 28 Jun 2015 10:49:23 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 29489 invoked by uid 99); 28 Jun 2015 10:49:22 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 28 Jun 2015 10:49:22 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 46CE0180338 for ; Sun, 28 Jun 2015 10:49:22 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.751 X-Spam-Level: * X-Spam-Status: No, score=1.751 tagged_above=-999 required=6.31 tests=[KAM_INFOUSMEBIZ=0.75, KAM_LAZY_DOMAIN_SECURITY=1, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id 8MCI6ZlNWl9I for ; Sun, 28 Jun 2015 10:49:13 +0000 (UTC) Received: from mail-la0-f54.google.com (mail-la0-f54.google.com [209.85.215.54]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id BF0A22122F for ; Sun, 28 Jun 2015 10:49:12 +0000 (UTC) Received: by lagh6 with SMTP id h6so25068218lag.2 for ; Sun, 28 Jun 2015 03:49:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=40y1NzXjaF/lw36nlT9RuJYM48K+RfaM7aQLzp0kID4=; b=iQMFNo+XV6JfpSfq21zKyCR/1TlGznkm84wEG1Rbtr6sUcJ1YDvYCjLbUnm+TBgdiN jkkTVtsKPBlhGtl5CLUQcMxxnnDF2rxL/ZsK3d5fhArIFX4BdRS3VJ/jc+i6SGctg87I 8LRgNF37FgV0xLKcvs3pBHG0kHcs/NlJD8iJKp/2G5IcgQTcWKXx6G1v7/QuPTRNcj6u lWiNnboMD5H+D90QLZ1gpFyVxIf78Mtvp/AA+RAaUB1adSDwb65Gl20Jzd+gTN/jtVzX IM34Pnsa8jJ7sotZlC8FR+/U3NFCIomssQq0iI/kz22omVaotHPUo/wpX3WpEhbRgBpq y3hQ== X-Gm-Message-State: ALoCoQmklyElMLwRuUxRmJZN7xgrQw8YCVdEeHfZfzdlECR70IEbWOiWZ4diyXrqo41BKPiQX2K2 X-Received: by 10.112.185.100 with SMTP id fb4mr9457630lbc.79.1435488551993; Sun, 28 Jun 2015 03:49:11 -0700 (PDT) MIME-Version: 1.0 Received: by 10.25.156.15 with HTTP; Sun, 28 Jun 2015 03:48:52 -0700 (PDT) In-Reply-To: References: From: Michael McCandless Date: Sun, 28 Jun 2015 06:48:52 -0400 Message-ID: Subject: Re: FreeText Auto-suggest To: "Lucene/Solr dev" Content-Type: text/plain; charset=UTF-8 Which documentation are you reading? The analyzer you send to FreeTextSuggester should not make shingles itself: the suggester does this internally, based on the grams parameter. Maybe look at the TestFreeTextSuggester unit test as an example? Mike McCandless http://blog.mikemccandless.com On Sat, Jun 27, 2015 at 6:52 PM, Alessandro Benedetti wrote: > Hi guys, > after reading the documentation for the FreetextSuggester I have some doubts > : > > Actually the documentation is not clear enough. > Let's try to understand this suggester. > > Building > This suggester build a FST that it will use to provide the autocomplete > feature running prefix searches on it . > The terms it uses to generate the FST are the tokens produced by the > "suggestFreeTextAnalyzerFieldType" . > > And this should be correct. > So if we have a shingle token filter[1-3] ( we produce unigrams as well) in > our analysis to keep it simple , from these original field values : > "mp3 ipod" > "mp3 player" > "mp3 player ipod" > "player of Real" > > -> we produce these list of possible suggestions in our FST : > > > > > > > > > > > > > > > > > From the documentation I read : >> >> " ngrams: The max number of tokens out of which singles will be make the >> dictionary. The default value is 2. Increasing this would mean you want more >> than the previous 2 tokens to be taken into consideration when making the >> suggestions. " > > > This makes me confused, as I was not expecting this param to affect the > suggestion dictionary. > So I would like a clarification here from our masters :) > At this point let's see what happens at query time . > > Query Time > As my understanding the ngrams params will consider the last N-1 tokens the > user put separated by the space separator. > >> "Builds an ngram model from the text sent to {@link >> * #build} and predicts based on the last grams-1 tokens in >> * the request sent to {@link #lookup}. This tries to >> * handle the "long tail" of suggestions for when the >> * incoming query is a never before seen query string." > > > Example , grams=3 should consider only the last 2 tokens > > special mp3 p -> mp3 p > > Then this query is analysed using the "suggestFreeTextAnalyzerFieldType" . > We produce 3 tokens : > >

> > > And we run the prefix matching on the FST . > > Conclusion > My understanding is wrong for sure at some point, as the behaviour I get is > different. > Can we discuss this , clarify this and eventually put it in the official > documentation ? > > Cheers > > -- > -------------------------- > > Benedetti Alessandro > Visiting card : http://about.me/alessandro_benedetti > > "Tyger, tyger burning bright > In the forests of the night, > What immortal hand or eye > Could frame thy fearful symmetry?" > > William Blake - Songs of Experience -1794 England --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org