lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrea Roggerone <andrearoggerone.o...@gmail.com>
Subject Re: Reverse query?
Date Fri, 02 Oct 2015 20:07:11 GMT
Hi, the phrase query format would be:
"Mad Max"~2
The * has been added by the mail aggregator around the chars in Bold for
some reason. That wasn't a wildcard.

On Friday, October 2, 2015, Roman Chyla <roman.chyla@gmail.com> wrote:

> I'd like to offer another option:
>
> you say you want to match long query into a document - but maybe you
> won't know whether to pick "Mad Max" or "Max is" (not mentioning the
> performance hit of "*mad max*" search - or is it not the case
> anymore?). Take a look at the NGram tokenizer (say size of 2; or
> bigger). What it does, it splits the input into overlapping segments
> of 'X' words (words, not characters - however, characters work too -
> just pick bigger N)
>
> mad max
> max 1979
> 1979 australian
>
> i'd recommend placing stopfilter before the ngram
>
>  - then for the long query string of "Hey Mad Max is 1979...." you
> wold search "hey mad" OR "mad max" OR "max 1979"... (perhaps the query
> tokenizer could be convinced to the search for you automatically). And
> voila, the more overlapping segments there, the higher the search
> result.
>
> hth,
>
> roman
>
>
>
> On Fri, Oct 2, 2015 at 12:03 PM, Erick Erickson <erickerickson@gmail.com
> <javascript:;>> wrote:
> > The admin/analysis page is your friend here, find it and use it ;)
> > Note you have to select a core on the admin UI screen before you can
> > see the choice.
> >
> > Because apart from the other comments, KeywordTokenizer is a red flag.
> > It does NOT break anything up into tokens, so if your doc contains:
> > Mad Max is a 1979 Australian
> > as the whole field, the _only_ match you'll ever get is if you search
> exactly
> > "Mad Max is a 1979 Australian"
> > Not Mad, not mad, not Max, exactly all 6 words separated by exactly one
> space.
> >
> > Andrea's suggestion is the one you want, but be sure you use one of
> > the tokenizing analysis chains, perhaps start with text_en (in the
> > stock distro). Be sure to completely remove your node/data directory
> > (as in rm -rf data) after you make the change.
> >
> > And really, explore the admin/analysis page; it's where a LOT of these
> > kinds of problems find solutions ;)
> >
> > Best,
> > Erick
> >
> > On Fri, Oct 2, 2015 at 7:57 AM, Ravi Solr <ravisolr@gmail.com
> <javascript:;>> wrote:
> >> Hello Remi,
> >>             Iam assuming the field where you store the data is analyzed.
> >> The field definition might help us answer your question better. If you
> are
> >> using edismax handler for your search requests, I believe you can
> achieve
> >> you goal by setting set your "mm" to 100%, phrase slop "ps" and query
> slop
> >> "qs" parameters to zero. I think that will force exact matches.
> >>
> >> Thanks
> >>
> >> Ravi Kiran Bhaskar
> >>
> >> On Fri, Oct 2, 2015 at 9:48 AM, Andrea Roggerone <
> >> andrearoggerone.osrc@gmail.com <javascript:;>> wrote:
> >>
> >>> Hi Remy,
> >>> The question is not really clear, could you explain a little bit better
> >>> what you need? Reading your email I understand that you want to get
> >>> documents containing all the search terms typed. For instance if you
> search
> >>> for "Mad Max", you wanna get documents containing both Mad and Max. If
> >>> that's your need, you can use a phrase query like:
> >>>
> >>> *"*Mad Max*"~2*
> >>>
> >>> where enclosing your keywords between double quotes means that you
> want to
> >>> get both Mad and Max and the optional parameter ~2 is an example of
> *slop*.
> >>> If you need more info you can look for *Phrase Query* in
> >>> https://wiki.apache.org/solr/SolrRelevancyFAQ
> >>>
> >>> On Fri, Oct 2, 2015 at 2:33 PM, remi tassing <tassingremi@gmail.com
> <javascript:;>>
> >>> wrote:
> >>>
> >>> > Hi,
> >>> > I have medium-low experience on Solr and I have a question I couldn't
> >>> quite
> >>> > solve yet.
> >>> >
> >>> > Typically we have quite short query strings (a couple of words) and
> the
> >>> > search is done through a set of bigger documents. What if the logic
> is
> >>> > turned a little bit around. I have a document and I need to find out
> what
> >>> > strings appear in the document. A string here could be a person name
> >>> > (including space for example) or a location...which are indexed in
> Solr.
> >>> >
> >>> > A concrete example, we take this text from wikipedia (Mad Max):
> >>> > "*Mad Max is a 1979 Australian dystopian action film directed by
> George
> >>> > Miller <https://en.wikipedia.org/wiki/George_Miller_%28director%29>.
> >>> > Written by Miller and James McCausland from a story by Miller and
> >>> producer
> >>> > Byron Kennedy <https://en.wikipedia.org/wiki/Byron_Kennedy>,
it
> tells a
> >>> > story of societal breakdown
> >>> > <https://en.wikipedia.org/wiki/Societal_collapse>, murder, and
> vengeance
> >>> > <https://en.wikipedia.org/wiki/Revenge>. The film, starring the
> >>> > then-little-known Mel Gibson <
> https://en.wikipedia.org/wiki/Mel_Gibson>,
> >>> > was released internationally in 1980. It became a top-grossing
> Australian
> >>> > film, while holding the record in the Guinness Book of Records
> >>> > <https://en.wikipedia.org/wiki/Guinness_Book_of_Records> for
> decades as
> >>> > the
> >>> > most profitable film ever created,[1]
> >>> > <https://en.wikipedia.org/wiki/Mad_Max_%28franchise%29#cite_note-1>
> and
> >>> > has
> >>> > been credited for further opening the global market to Australian New
> >>> Wave
> >>> > <https://en.wikipedia.org/wiki/Australian_New_Wave> films.*
> >>> > <https://en.wikipedia.org/wiki/Mad_Max_%28franchise%29#cite_note-2>
> >>> > <https://en.wikipedia.org/wiki/Mad_Max_%28franchise%29#cite_note-3>"
> >>> >
> >>> > I would like it to match "Mad Max" but not "Mad" or "Max"
> seperately, and
> >>> > "George Miller", "global market" ...
> >>> >
> >>> > I've tried the keywordTokenizer but it didn't work. I suppose it's
> ok for
> >>> > the index time but not query time (in this specific case)
> >>> >
> >>> > I had a look at Luwak but it's not what I'm looking for (
> >>> >
> >>> >
> >>>
> http://www.flax.co.uk/blog/2013/12/06/introducing-luwak-a-library-for-high-performance-stored-queries/
> >>> > )
> >>> >
> >>> > The typical name search doesn't seem to work either,
> >>> > https://dzone.com/articles/tips-name-search-solr
> >>> >
> >>> > I was thinking this problem must have already be solved...or?
> >>> >
> >>> > Remi
> >>> >
> >>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message