lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aki Balogh <...@marketmuse.com>
Subject Re: Does docValues impact termfreq ?
Date Sat, 24 Oct 2015 15:24:26 GMT
Hi Jack,

I'm just using solr to get word count across a large number of documents.

It's somewhat non-standard, because we're ignoring relevance, but it seems
to work well for this use case otherwise.

My understanding then is:
1) since termfreq is pre-processed and fetched, there's no good way to
speed it up (except by caching earlier calculations)

2) there's no way to have solr sum up all of the termfreqs across all
documents in a search and just return one number for total termfreqs


Are these correct?

Thanks,
Aki


On Sat, Oct 24, 2015 at 11:20 AM, Jack Krupansky <jack.krupansky@gmail.com>
wrote:

> That's what a normal query does - Lucene takes all the terms used in the
> query and sums them up for each document in the response, producing a
> single number, the score, for each document. That's the way Solr is
> designed to be used. You still haven't elaborated why you are trying to use
> Solr in a way other than it was intended.
>
> -- Jack Krupansky
>
> On Sat, Oct 24, 2015 at 11:13 AM, Aki Balogh <aki@marketmuse.com> wrote:
>
> > Gotcha - that's disheartening.
> >
> > One idea: when I run termfreq, I get all of the termfreqs for each
> document
> > one-by-one.
> >
> > Is there a way to have solr sum it up before creating the request, so I
> > only receive one number in the response?
> >
> >
> > On Sat, Oct 24, 2015 at 11:05 AM, Upayavira <uv@odoko.co.uk> wrote:
> >
> > > If you mean using the term frequency function query, then I'm not sure
> > > there's a huge amount you can do to improve performance.
> > >
> > > The term frequency is a number that is used often, so it is stored in
> > > the index pre-calculated. Perhaps, if your data is not changing,
> > > optimising your index would reduce it to one segment, and thus might
> > > ever so slightly speed the aggregation of term frequencies, but I doubt
> > > it'd make enough difference to make it worth doing.
> > >
> > > Upayavira
> > >
> > > On Sat, Oct 24, 2015, at 03:37 PM, Aki Balogh wrote:
> > > > Thanks, Jack. I did some more research and found similar results.
> > > >
> > > > In our application, we are making multiple (think: 50) concurrent
> > > > requests
> > > > to calculate term frequency on a set of documents in "real-time". The
> > > > faster that results return, the better.
> > > >
> > > > Most of these requests are unique, so cache only helps slightly.
> > > >
> > > > This analysis is happening on a single solr instance.
> > > >
> > > > Other than moving to solr cloud and splitting out the processing onto
> > > > multiple servers, do you have any suggestions for what might speed up
> > > > termfreq at query time?
> > > >
> > > > Thanks,
> > > > Aki
> > > >
> > > >
> > > > On Fri, Oct 23, 2015 at 7:21 PM, Jack Krupansky
> > > > <jack.krupansky@gmail.com>
> > > > wrote:
> > > >
> > > > > Term frequency applies only to the indexed terms of a tokenized
> > field.
> > > > > DocValues is really just a copy of the original source text and is
> > not
> > > > > tokenized into terms.
> > > > >
> > > > > Maybe you could explain how exactly you are using term frequency
in
> > > > > function queries. More importantly, what is so "heavy" about your
> > > usage?
> > > > > Generally, moderate use of a feature is much more advisable to
> heavy
> > > usage,
> > > > > unless you don't care about performance.
> > > > >
> > > > > -- Jack Krupansky
> > > > >
> > > > > On Fri, Oct 23, 2015 at 8:19 AM, Aki Balogh <aki@marketmuse.com>
> > > wrote:
> > > > >
> > > > > > Hello,
> > > > > >
> > > > > > In our solr application, we use a Function Query (termfreq)
very
> > > heavily.
> > > > > >
> > > > > > Index time and disk space are not important, but we're looking
to
> > > improve
> > > > > > performance on termfreq at query time.
> > > > > > I've been reading up on docValues. Would this be a way to improve
> > > > > > performance?
> > > > > >
> > > > > > I had read that Lucene uses Field Cache for Function Queries,
so
> > > > > > performance may not be affected.
> > > > > >
> > > > > >
> > > > > > And, any general suggestions for improving query performance
on
> > > Function
> > > > > > Queries?
> > > > > >
> > > > > > Thanks,
> > > > > > Aki
> > > > > >
> > > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message