lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joel Bernstein <joels...@gmail.com>
Subject Re: statistics in hitlist
Date Fri, 23 Feb 2018 22:59:01 GMT
This is going to be a complex answer because Solr actually now has multiple
ways of doing regression analysis as part of the Streaming Expression
statistical programming library. The basic documentation is here:

https://lucene.apache.org/solr/guide/7_2/statistical-programming.html

Here is a sample expression that performs a simple linear regression in
Solr 7.2:

let(a=random(collection1, q="any query", rows="15000", fl="fieldA, fieldB"),
    b=col(a, fieldA),
    c=col(a, fieldB),
    d=regress(b, c))


The expression above takes a random sample of 15000 results from
collection1. The result set will include fieldA and fieldB in each record.
The result set is stored in variable "a".

Then the "col" function creates arrays of numbers from the results stored
in variable a. The values in fieldA are stored in the variable "b". The
values in fieldB are stored in variable "c".

Then the regress function performs a simple linear regression on arrays
stored in variables "b" and "c".

The output of the regress function is a map containing the regression
result. This result includes RSquared and other attributes of the
regression model such as R (correlation), slope, y intercept etc...









Joel Bernstein
http://joelsolr.blogspot.com/

On Fri, Feb 23, 2018 at 3:10 PM, John Smith <localdevjs@gmail.com> wrote:

> Hi Joel, thanks for the answer. I'm not really a stats guy, but the end
> result of all this is supposed to be obtaining R^2. Is there no way of
> obtaining this value, then (short of iterating over all the results in the
> hitlist and calculating it myself)?
>
> On Fri, Feb 23, 2018 at 12:26 PM, Joel Bernstein <joelsolr@gmail.com>
> wrote:
>
> > Typically SSE is the sum of the squared errors of the prediction in a
> > regression analysis. The stats component doesn't perform regression,
> > although it might be a nice feature.
> >
> >
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Fri, Feb 23, 2018 at 12:17 PM, John Smith <localdevjs@gmail.com>
> wrote:
> >
> > > I'm using solr, and enabling stats as per this page:
> > > https://lucene.apache.org/solr/guide/6_6/the-stats-component.html
> > >
> > > I want to get more stat values though. Specifically I'm looking for
> > > r-squared (coefficient of determination). This value is not present in
> > > solr, however some of the pieces used to calculate r^2 are in the stats
> > > element, for example:
> > >
> > > <double name="min">0.0</double>
> > > <double name="max">10.0</double>
> > > <long name="count">15</long>
> > > <long name="missing">17</long>
> > > <double name="sum">85.0</double>
> > > <double name="sumOfSquares">603.0</double>
> > > <double name="mean">5.666666666666667</double>
> > > <double name="stddev">2.943920288775949</double>
> > >
> > >
> > > So I have the sumOfSquares available (SST), and using this
> calculation, I
> > > can get R^2:
> > >
> > > R^2 = 1 - SSE/SST
> > >
> > > All I need then is SSE. Is there anyway I can get SSE from those other
> > > stats in solr?
> > >
> > > Thanks in advance!
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message