lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Toy <jason...@gmail.com>
Subject Re: bug in termfreq? was Re: is it possible to do a sort without query?
Date Mon, 08 Aug 2011 22:14:45 GMT
I am trying to test out and compare different sorts and scoring.

 When I use dismax to search for "indie music"
with: qf=all_lists_text&q="indie+music"&defType=dismax&rows=100
I see some stuff that seems "irrelevant", meaning in top results I see only
1 or 2 mentions of "indie music", but when I look further down the list I do
see other docs that have more occurrences of "indie music".
So I a want to test by comparing the the different queries versus seeing a
list of docs ranked specifically by the count of occurrences of the phrase
"indie music"

On Mon, Aug 8, 2011 at 2:19 PM, Markus Jelsma <markus.jelsma@openindex.io>wrote:

>
> > Dismax queries can. But
> >
> > sort=termfreq(all_lists_text,'indie+music')
> >
> > is not using dismax.  Apparenty termfreq function can not? I am not
> > familiar with the termfreq function.
>
> It simply returns the TF of the given _term_  as it is indexed of the
> current
> document.
>
> Sorting on TF like this seems strange as by default queries are already
> sorted
> that way since TF plays a big role in the final score.
>
> >
> > To understand why you'd need to reindex, you might want to read up on how
> > lucene actually works, to get a basic understanding of how different
> > indexing choices effect what is possible at query time. Lucene In Action
> > is a pretty good book.
> >
> > On 8/8/2011 5:02 PM, Jason Toy wrote:
> > > Are not  Dismax queries able to search for phrases using the default
> > > index(which is what I am using?) If I can already do phrase  searches,
> I
> > > don't understand why I would need to reindex t be able to access
> phrases
> > > from a function.
> > >
> > > On Mon, Aug 8, 2011 at 1:49 PM, Markus
> Jelsma<markus.jelsma@openindex.io>wrote:
> > >>> Aelexei, thank you , that does seem to work.
> > >>>
> > >>> My sort results seem to be totally wrong though, I'm not sure if its
> > >>> because of my sort function or something else.
> > >>>
> > >>> My query consists of:
> > >>> sort=termfreq(all_lists_text,'indie+music')+desc&q=*:*&rows=100
> > >>> And I get back 4571232 hits.
> > >>
> > >> That's normal, you issue a catch all query. Sorting should work but..
> > >>
> > >>> All the results don't have the phrase "indie music" anywhere in their
> > >>
> > >> data.
> > >>
> > >>>   Does termfreq not support phrases?
> > >>
> > >> No, it is TERM frequency and indie music is not one term. I don't know
> > >> how this function parses your input but it might not understand your +
> > >> escape and
> > >> think it's one term constisting of exactly that.
> > >>
> > >>> If not, how can I sort specifically by termfreq of a phrase?
> > >>
> > >> You cannot. What you can do is index multiple terms as one term using
> > >> the shingle filter. Take care, it can significantly increase your
> index
> > >> size and
> > >> number of unique terms.
> > >>
> > >>> On Mon, Aug 8, 2011 at 1:08 PM, Alexei Martchenko<
> > >>>
> > >>> alexei@superdownloads.com.br>  wrote:
> > >>>> You can use the standard query parser and pass q=*:*
> > >>>>
> > >>>> 2011/8/8 Jason Toy<jasontoy@gmail.com>
> > >>>>
> > >>>>> I am trying to list some data based on a function I run ,
> > >>>>> specifically  termfreq(post_text,'indie music')  and I am unable
to
> > >>
> > >> do
> > >>
> > >>>>> it without passing in data to the q paramater.  Is it possible
to
> get
> > >>>>> a
> > >>>>
> > >>>> sorted
> > >>>>
> > >>>>> list without searching for any terms?
> > >>>>
> > >>>> --
> > >>>>
> > >>>> *Alexei Martchenko* | *CEO* | Superdownloads
> > >>>> alexei@superdownloads.com.br | alexei@martchenko.com.br | (11)
> > >>>> 5083.1018/5080.3535/5080.3533
>



-- 
- sent from my mobile
6176064373

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message