lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Flavio Pompermaier <pomperma...@okkam.it>
Subject Re: multiValued field score and count
Date Wed, 26 Jun 2013 12:38:19 GMT
I tried to play a little with the tools you suggested. However, I probably
miss something because the term frequency is not that expected.
My itemid field is defined (in schema.xml) as:

 <field name="itemid" type="string" indexed="true" stored="true"
multiValued="true"/>

I was supposing that indexing via post.sh the xml mentioned in the first
mail, the term frequency of itemid 1000 was 3 in the first doc and 1 in the
second!
Instead, I got that result only if I change my settings to:

 <field name="itemid" type="text_ws" indexed="true" stored="true"
multiValued="true"/>
 <fieldType name="text_ws" class="solr.TextField"
positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      </analyzer>
    </fieldType>

and I modify my populating xml as:

<doc>
   <id>1</id>
   <authorid>11</authorid>
   <authorid>9</authorid>
   <itemid>1000 1000 1000</itemid>
   <itemid>5000</itemid>
</doc>
<doc>
   <id>2</id>
   <authorid>3</authorid>
   <itemid>1000</itemid>
</doc>

Is there a way to achieve termFrequency=3 for doc1 also using my initial
settings (itemid as string and just one value per itemid-tag)?

Best,
Flavio

On Wed, Jun 26, 2013 at 12:38 PM, Upayavira <uv@odoko.co.uk> wrote:

> I mentioned two features, [explain] and termfreq(field, 'value').
> Neither of these require anything special, as they are using stuff
> central to Lucene's scoring mechanisms. I think you can turn off the
> storage of term frequencies, obviously that would spoil things, but
> that's certainly not on my default.
>
> I typed the syntax below from memory, so I might not have got it exactly
> right.
>
> Upayavira
>
> On Wed, Jun 26, 2013, at 10:22 AM, Flavio Pompermaier wrote:
> > So, in order to achieve that feature I have to declare my fileds
> > (authorid
> > and itemid) with termVectors="true" termPositions="true"
> > termOffsets="false"?
> > Should it be enough?
> >
> >
> > On Wed, Jun 26, 2013 at 10:42 AM, Upayavira <uv@odoko.co.uk> wrote:
> >
> > > Add fl=[explain],* to your query, and review the output in the new
> > > field. It will tell you how the score was calculated. Look at the TF or
> > > termfreq values, as this is the number of times the term appears.
> > >
> > > Also, you could add this to your fl= param: count:termfreq(authorid,
> > > '1000’) which would give you a new field telling you how many times the
> > > term 1000 appears in the authorid field for each document.
> > >
> > > Upayavira
> > >
> > > On Wed, Jun 26, 2013, at 09:34 AM, Flavio Pompermaier wrote:
> > > > Hi to everybody,
> > > > I have some multiValued (single-token) field, for example authorid
> and
> > > > itemid, and what I'd like to know if there's the possibility to know
> how
> > > > many times a match was found in that document for some field and if
> the
> > > > score is higher when multiple match are found. For example, my docs
> are:
> > > >
> > > > <doc>
> > > >    <id>1</id>
> > > >    <authorid>11</authorid>
> > > >    <authorid>9</authorid>
> > > >    <itemid>1000</itemid>
> > > >    <itemid>1000</itemid>
> > > >    <itemid>1000</itemid>
> > > >    <itemid>5000</itemid>
> > > > </doc>
> > > > <doc>
> > > >    <id>2</id>
> > > >    <authorid>3</authorid>
> > > >    <itemid>1000</itemid>
> > > > </doc>
> > > >
> > > > Whould the first document have an higher score than the second if I
> > > > search
> > > > for itemid=1000? Is it possible to know how many times the match was
> > > > found
> > > > (3 for the doc1 and 1 for doc2)?
> > > >
> > > > Otherwise, how could I achieve that result?
> > > >
> > > > Best,
> > > > Flavio
> > > > --
> > > >
> > > > Flavio Pompermaier
> > > > *Development Department
> > > > *_______________________________________________
> > > > *OKKAM**Srl **- www.okkam.it*
> > > >
> > > > *Phone:* +(39) 0461 283 702
> > > > *Fax:* + (39) 0461 186 6433
> > > > *Email:* f.pompermaier@okkam.it
> > > > *Headquarters:* Trento (Italy), fraz. Villazzano, Salita dei Molini 2
> > > > *Registered office:* Trento (Italy), via Segantini 23
> > > >
> > > > Confidentially notice. This e-mail transmission may contain legally
> > > > privileged and/or confidential information. Please do not read it if
> you
> > > > are not the intended recipient(S). Any use, distribution,
> reproduction or
> > > > disclosure by any other person is strictly prohibited. If you have
> > > > received
> > > > this e-mail in error, please notify the sender and destroy the
> original
> > > > transmission and its attachments without reading or saving it in any
> > > > manner.
> > >
> >
> >
> >
> > --
> >
> > Flavio Pompermaier
> > *Development Department
> > *_______________________________________________
> > *OKKAM**Srl **- www.okkam.it*
> >
> > *Phone:* +(39) 0461 283 702
> > *Fax:* + (39) 0461 186 6433
> > *Email:* f.pompermaier@okkam.it
> > *Headquarters:* Trento (Italy), fraz. Villazzano, Salita dei Molini 2
> > *Registered office:* Trento (Italy), via Segantini 23
> >
> > Confidentially notice. This e-mail transmission may contain legally
> > privileged and/or confidential information. Please do not read it if you
> > are not the intended recipient(S). Any use, distribution, reproduction or
> > disclosure by any other person is strictly prohibited. If you have
> > received
> > this e-mail in error, please notify the sender and destroy the original
> > transmission and its attachments without reading or saving it in any
> > manner.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message