From solr-user-return-139581-archive-asf-public=cust-asf.ponee.io@lucene.apache.org Mon Mar 5 22:11:47 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 34A3B180608 for ; Mon, 5 Mar 2018 22:11:46 +0100 (CET) Received: (qmail 60286 invoked by uid 500); 5 Mar 2018 21:11:44 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 60265 invoked by uid 99); 5 Mar 2018 21:11:43 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Mar 2018 21:11:43 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id DE433C09E6 for ; Mon, 5 Mar 2018 21:11:42 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.901 X-Spam-Level: ** X-Spam-Status: No, score=2.901 tagged_above=-999 required=6.31 tests=[AC_DIV_BONANZA=0.001, DC_PNG_UNO_LARGO=0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, KAM_LIVE=1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, WEIRD_PORT=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id YhQlaoF0HHlS for ; Mon, 5 Mar 2018 21:11:40 +0000 (UTC) Received: from mail-wr0-f175.google.com (mail-wr0-f175.google.com [209.85.128.175]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 4AF555F3B9 for ; Mon, 5 Mar 2018 21:11:40 +0000 (UTC) Received: by mail-wr0-f175.google.com with SMTP id v18so18122554wrv.0 for ; Mon, 05 Mar 2018 13:11:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=xeZdhaj7ol3ur4EPER2qYb1Gceu/NqrR2BBqa3WPSi8=; b=NLNw9++toGNYS98LDhEibRjA9PlgnKPKzpaGl9kSxiXIE556VVEaT8dOf4tdiYOpJM TcyMb0UHNk0RAi6uAMBXzgInGF5oQWBelT+wC1C9km71CNGYXt6PAvyR/Reds9GWqNIY 284BlZHIubpCB5ZAL8rPULnlXeaPfs7I8ZxbK472hOZaKPXd/3QxhGaWQgZ5aE4rnwwD AhahsKoss3lH1FrDHrhUSYcpvCdcJCMwk/qcgx/6itf4OO129D5akbgUBSy15tPpLb/v OoHL/dCrzJ23PB4M++yZc2cOXkzZiKcBHxGVWAjLkuaAz8mzu+UEYqJy/+oEanxo4IBx yubA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=xeZdhaj7ol3ur4EPER2qYb1Gceu/NqrR2BBqa3WPSi8=; b=Smr00KkvIWHscydwZXKMpBIHKWzdWZcn3RpA5iMMXmfWWRgHwvnramy+X3LXAWoqtG qREsbzGDfEbs9TvFgVItzuNSs78HEOuD8g7UNc8pnnuzJTNy2J+huewMAdMEGlHWfQYT t/i1DxCl9uVginTjrOtXAD1hHE1pQ+p3CpanMFe4Re55ALlJMOiqRdYAkEt6MdqNnP3Z 7vQgXfJqq8CxMXBVxqlBOaTQ2+PTxMUoe+7shjr/AbzyTEMBdDRYHkr4pOepjrmD6vpb DWn2JHoWriiPNmhfr+v3f52X204EAYlEHnEpgw9apZ1OZmvGSqGJ9FY1kbbXzkh0G3N3 Xrdg== X-Gm-Message-State: APf1xPB74K8ksArbPNMLDcY6DlrvPfW8eOWqCGD01ii2JnjjL6fDNO4b qMi0i5QNXok/zawsThVeJjw3PHv6moNgMDc2/071zg== X-Google-Smtp-Source: AG47ELv+3yXJRlEjcpxFYEXPBZnnZAkQj32UVls64rjGAZe6ZVF8S+zGjIX6joVsVRyPClToHYD8lFxSBPyZEazogo8= X-Received: by 10.223.155.142 with SMTP id d14mr14377687wrc.93.1520284298646; Mon, 05 Mar 2018 13:11:38 -0800 (PST) MIME-Version: 1.0 Received: by 10.28.190.8 with HTTP; Mon, 5 Mar 2018 13:11:37 -0800 (PST) In-Reply-To: References: From: John Smith Date: Mon, 5 Mar 2018 16:11:37 -0500 Message-ID: Subject: Re: statistics in hitlist To: solr-user@lucene.apache.org Content-Type: multipart/related; boundary="94eb2c1b4ec89434380566b0c59c" --94eb2c1b4ec89434380566b0c59c Content-Type: multipart/alternative; boundary="94eb2c1b4ec89434350566b0c59b" --94eb2c1b4ec89434350566b0c59b Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thanks Joel for your help on this. What I've done so far: - unzip downloaded solr-7.2 - modify the _default "managed-schema" to add the random field type and the dynamic random field - start solr7 using "solr start -c" - indexed my data using pint/pdouble/boolean field types etc I can now run the random function all by itself, it returns random results as expected. So far so good! However... now trying to get the regression stuff working: let(a=3Drandom(tx_prod_production, q=3D"*:*", fq=3D"isParent:true", rows=3D= "15000", fl=3D"oil_first_90_days_production,oil_last_30_days_production"), b=3Dcol(a, oil_first_90_days_production), c=3Dcol(a, oil_last_30_days_production), d=3Dregress(b, c)) Posted directly into solr admin UI. Run the streaming expression and I get this error message: "EXCEPTION": "Failed to evaluate expression regress(b,c) - Numeric value expected but found type java.lang.String for value oil_first_90_days_production" It thinks my numeric field is defined as a string? But when I view the schema, those 2 fields are defined as ints: When I run a normal query and choose xml as output format, then it also puts "int" elements into the hitlist, so the schema appears to be correct it's just when using this regress function that something goes wrong and solr thinks the field is string. Any suggestions? Thanks! =E2=80=8B On Thu, Mar 1, 2018 at 9:12 PM, Joel Bernstein wrote: > The field type will also need to be in the schema: > > > > > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Thu, Mar 1, 2018 at 8:00 PM, Joel Bernstein wrote= : > > > You'll need to have this field in your schema: > > > > > > > > I'll check to see if the default schema used with solr start -c has thi= s > > field, if not I'll add it. Thanks for pointing this out. > > > > I checked and right now the random expression is only accepting one fq, > > but I consider this a bug. It should accept multiple. I'll create ticke= t > > for getting this fixed. > > > > > > > > Joel Bernstein > > http://joelsolr.blogspot.com/ > > > > On Thu, Mar 1, 2018 at 4:55 PM, John Smith wrote= : > > > >> Joel, thanks for the pointers to the streaming feature. I had no idea > solr > >> had that (and also just discovered the very intersting sql feature! I > will > >> be sure to investigate that in more detail in the future). > >> > >> However I'm having some trouble getting basic streaming functions > working. > >> I've already figured out that I had to move to "solr cloud" instead of > >> "solr standalone" because I was getting errors about "cannot find zk > >> instance" or whatever which went away when using "solr start -c" > instead. > >> > >> But now I'm trying to use the random function since that was one of th= e > >> functions used in your example. > >> > >> random(tx_header, q=3D"*:*", rows=3D"100", fl=3D"countyname") > >> > >> I posted that directly in the "stream" section of the solr admin UI. > This > >> is all on linux, with solr 7.1.0 and 7.2.1 (tried several versions in > case > >> it was a bug in one) > >> > >> I get back an error message: > >> *sort param could not be parsed as a query, and is not a field that > exists > >> in the index: random_-255009774* > >> > >> I'm not passing in any sort field anywhere. But the solr logs show the= se > >> three log entries: > >> > >> 2018-03-01 21:41:18.954 INFO (qtp257513673-21) [c:tx_header s:shard1 > >> r:core_node2 x:tx_header_shard1_replica_n1] o.a.s.c.S.Request > >> [tx_header_shard1_replica_n1] webapp=3D/solr path=3D/select > >> params=3D{q=3D*:*&_stateVer_=3Dtx_header:6&fl=3Dcountyname > >> *&sort=3Drandom_-255009774+asc*&rows=3D100&wt=3Djavabin&version=3D2} s= tatus=3D400 > >> QTime=3D19 > >> > >> 2018-03-01 21:41:18.966 ERROR (qtp257513673-17) [c:tx_header s:shard1 > >> r:core_node2 x:tx_header_shard1_replica_n1] o.a.s.c.s.i.CloudSolrClien= t > >> Request to collection [tx_header] failed due to (400) > >> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: > >> Error > >> from server at http://192.168.13.31:8983/solr/tx_header: sort param > could > >> not be parsed as a query, and is not a field that exists in the index: > >> random_-255009774, retry? 0 > >> > >> 2018-03-01 21:41:18.968 ERROR (qtp257513673-17) [c:tx_header s:shard1 > >> r:core_node2 x:tx_header_shard1_replica_n1] > o.a.s.c.s.i.s.ExceptionStream > >> java.io.IOException: > >> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: > >> Error > >> from server at http://192.168.13.31:8983/solr/tx_header: sort param > could > >> not be parsed as a query, and is not a field that exists in the index: > >> random_-255009774 > >> > >> > >> So basically it looks like solr is injecting the "sort=3Drandom_" stuf= f > into > >> my query and of course that is failing on the search since that > >> field/column doesn't exist in my schema. Everytime I run the random > >> function, I get a slightly different field name that it injects, but > they > >> all start with "random_" etc. > >> > >> I have tried adding my own sort field instead, hoping solr wouldn't > inject > >> one for me, but it still injected a random sort fieldname: > >> random(tx_header, q=3D"*:*", rows=3D"100", fl=3D"countyname", sort=3D"= countyname > >> asc") > >> > >> > >> Assuming I can fix that whole problem, my second question is: can I ad= d > >> multiple "fq=3D" parameters to the random function? I build a pretty > >> complicated query using many fq=3D fields, and then want to run some s= tats > >> on > >> that hitlist; so somehow I have to pass in the query that made up the > >> exact > >> hitlist to these various functions, but when I used multiple "fq=3D" > values > >> it only seemed to use the last one I specified and just ignored all th= e > >> previous fq's? > >> > >> Thanks in advance for any comments/suggestions...! > >> > >> > >> > >> > >> On Fri, Feb 23, 2018 at 5:59 PM, Joel Bernstein > >> wrote: > >> > >> > This is going to be a complex answer because Solr actually now has > >> multiple > >> > ways of doing regression analysis as part of the Streaming Expressio= n > >> > statistical programming library. The basic documentation is here: > >> > > >> > https://lucene.apache.org/solr/guide/7_2/statistical-programming.htm= l > >> > > >> > Here is a sample expression that performs a simple linear regression > in > >> > Solr 7.2: > >> > > >> > let(a=3Drandom(collection1, q=3D"any query", rows=3D"15000", fl=3D"f= ieldA, > >> > fieldB"), > >> > b=3Dcol(a, fieldA), > >> > c=3Dcol(a, fieldB), > >> > d=3Dregress(b, c)) > >> > > >> > > >> > The expression above takes a random sample of 15000 results from > >> > collection1. The result set will include fieldA and fieldB in each > >> record. > >> > The result set is stored in variable "a". > >> > > >> > Then the "col" function creates arrays of numbers from the results > >> stored > >> > in variable a. The values in fieldA are stored in the variable "b". > The > >> > values in fieldB are stored in variable "c". > >> > > >> > Then the regress function performs a simple linear regression on > arrays > >> > stored in variables "b" and "c". > >> > > >> > The output of the regress function is a map containing the regressio= n > >> > result. This result includes RSquared and other attributes of the > >> > regression model such as R (correlation), slope, y intercept etc... > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > Joel Bernstein > >> > http://joelsolr.blogspot.com/ > >> > > >> > On Fri, Feb 23, 2018 at 3:10 PM, John Smith > >> wrote: > >> > > >> > > Hi Joel, thanks for the answer. I'm not really a stats guy, but th= e > >> end > >> > > result of all this is supposed to be obtaining R^2. Is there no wa= y > of > >> > > obtaining this value, then (short of iterating over all the result= s > in > >> > the > >> > > hitlist and calculating it myself)? > >> > > > >> > > On Fri, Feb 23, 2018 at 12:26 PM, Joel Bernstein < > joelsolr@gmail.com> > >> > > wrote: > >> > > > >> > > > Typically SSE is the sum of the squared errors of the prediction > in > >> a > >> > > > regression analysis. The stats component doesn't perform > regression, > >> > > > although it might be a nice feature. > >> > > > > >> > > > > >> > > > > >> > > > Joel Bernstein > >> > > > http://joelsolr.blogspot.com/ > >> > > > > >> > > > On Fri, Feb 23, 2018 at 12:17 PM, John Smith < > localdevjs@gmail.com> > >> > > wrote: > >> > > > > >> > > > > I'm using solr, and enabling stats as per this page: > >> > > > > https://lucene.apache.org/solr/guide/6_6/the-stats- > component.html > >> > > > > > >> > > > > I want to get more stat values though. Specifically I'm lookin= g > >> for > >> > > > > r-squared (coefficient of determination). This value is not > >> present > >> > in > >> > > > > solr, however some of the pieces used to calculate r^2 are in > the > >> > stats > >> > > > > element, for example: > >> > > > > > >> > > > > 0.0 > >> > > > > 10.0 > >> > > > > 15 > >> > > > > 17 > >> > > > > 85.0 > >> > > > > 603.0 > >> > > > > 5.666666666666667 > >> > > > > 2.943920288775949 > >> > > > > > >> > > > > > >> > > > > So I have the sumOfSquares available (SST), and using this > >> > > calculation, I > >> > > > > can get R^2: > >> > > > > > >> > > > > R^2 =3D 1 - SSE/SST > >> > > > > > >> > > > > All I need then is SSE. Is there anyway I can get SSE from tho= se > >> > other > >> > > > > stats in solr? > >> > > > > > >> > > > > Thanks in advance! > >> > > > > > >> > > > > >> > > > >> > > >> > > > > > --94eb2c1b4ec89434350566b0c59b Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Tha= nks Joel for your help on this.

What I've done so far:
=
- unzip downloaded solr-7.2
- modify the _default "mana= ged-schema" to add the random field type and the dynamic random field<= br>
- start solr7 using "solr start -c"
- inde= xed my data using pint/pdouble/boolean field types etc

I can n= ow run the random function all by itself, it returns random results as expe= cted. So far so good!

However... now trying to get the regress= ion stuff working:

let(a=3Drandom(tx_prod_production, q=3D"*:*&= quot;, fq=3D"isParent:true", rows=3D"15000", fl=3D"= ;oil_first_90_days_production,oil_last_30_days_production"),
=C2=A0= =C2=A0=C2=A0 b=3Dcol(a, oil_first_90_days_production),
=C2=A0=C2=A0=C2= =A0 c=3Dcol(a, oil_last_30_days_production),
=C2=A0=C2=A0=C2=A0 d=3Dregr= ess(b, c))

Posted directly into solr admin UI. Run the streami= ng expression and I get this error message:
"EXCEPTION": "Failed to evaluate exp= ression regress(b,c) - Numeric value expected but found type java.lang.Stri= ng for value oil_first_90_days_production"

It thinks my numeric field is = defined as a string? But when I view the schema, those 2 fields are defined= as ints:


When I run a normal query and choose xml as output format, then= it also puts "int" elements into the hitlist, so the schema appe= ars to be correct it's just when using this regress function that somet= hing goes wrong and solr thinks the field is string.

Any suggestions?
Thanks!
=E2= =80=8B

On Thu, Mar 1, 2018 at 9:12 PM, Joel Bernstein = <joelsolr@gmail.com> wrote:
The field type will also need to be in the schema:

=C2=A0<!-- The "RandomSortField" is not used to store or searc= h any

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0data.=C2=A0 You can declare fields of thi= s type it in your schema

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0to generate pseudo-random orderings of yo= ur docs for sorting

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0or function purposes.=C2=A0 The ordering = is generated based on the field

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0name and the version of the index. As lon= g as the index version

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0remains unchanged, and the same field nam= e is reused,

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0the ordering of the docs will be consiste= nt.

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0If you want different psuedo-random order= ings of documents,

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0for the same version of the index, use a = dynamicField and

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0change the field name in the request.

=C2=A0 =C2=A0 =C2=A0-->

<fieldType name=3D"random" class=3D"solr.RandomSortField&= quot; indexed=3D"true" />


Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Mar 1, 2018 at 8:00 PM, Joel Bernstein <joelsolr@gmail.com> wrote:

> You'll need to have this field in your schema:
>
> <dynamicField name=3D"random_*" type=3D"random"= />
>
> I'll check to see if the default schema used with solr start -c ha= s this
> field, if not I'll add it. Thanks for pointing this out.
>
> I checked and right now the random expression is only accepting one fq= ,
> but I consider this a bug. It should accept multiple. I'll create = ticket
> for getting this fixed.
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Thu, Mar 1, 2018 at 4:55 PM, John Smith <localdevjs@gmail.com> wrote:
>
>> Joel, thanks for the pointers to the streaming feature. I had no i= dea solr
>> had that (and also just discovered the very intersting sql feature= ! I will
>> be sure to investigate that in more detail in the future).
>>
>> However I'm having some trouble getting basic streaming functi= ons working.
>> I've already figured out that I had to move to "solr clou= d" instead of
>> "solr standalone" because I was getting errors about &qu= ot;cannot find zk
>> instance" or whatever which went away when using "solr s= tart -c" instead.
>>
>> But now I'm trying to use the random function since that was o= ne of the
>> functions used in your example.
>>
>> random(tx_header, q=3D"*:*", rows=3D"100", fl= =3D"countyname")
>>
>> I posted that directly in the "stream" section of the so= lr admin UI. This
>> is all on linux, with solr 7.1.0 and 7.2.1 (tried several versions= in case
>> it was a bug in one)
>>
>> I get back an error message:
>> *sort param could not be parsed as a query, and is not a field tha= t exists
>> in the index: random_-255009774*
>>
>> I'm not passing in any sort field anywhere. But the solr logs = show these
>> three log entries:
>>
>> 2018-03-01 21:41:18.954 INFO=C2=A0 (qtp257513673-21) [c:tx_header = s:shard1
>> r:core_node2 x:tx_header_shard1_replica_n1] o.a.s.c.S.Request
>> [tx_header_shard1_replica_n1]=C2=A0 webapp=3D/solr path=3D/select<= br> >> params=3D{q=3D*:*&_stateVer_=3Dtx_header:6&fl=3Dcount= yname
>> *&sort=3Drandom_-255009774+asc*&rows=3D100&wt=3Dj= avabin&version=3D2} status=3D400
>> QTime=3D19
>>
>> 2018-03-01 21:41:18.966 ERROR (qtp257513673-17) [c:tx_header s:sha= rd1
>> r:core_node2 x:tx_header_shard1_replica_n1] o.a.s.c.s.i.CloudSolrC= lient
>> Request to collection [tx_header] failed due to (400)
>> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteS= olrException:
>> Error
>> from server at http://192.168.13.31:8983/solr/= tx_header: sort param could
>> not be parsed as a query, and is not a field that exists in the in= dex:
>> random_-255009774, retry? 0
>>
>> 2018-03-01 21:41:18.968 ERROR (qtp257513673-17) [c:tx_header s:sha= rd1
>> r:core_node2 x:tx_header_shard1_replica_n1] o.a.s.c.s.i.s.Exceptio= nStream
>> java.io.IOException:
>> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteS= olrException:
>> Error
>> from server at http://192.168.13.31:8983/solr/= tx_header: sort param could
>> not be parsed as a query, and is not a field that exists in the in= dex:
>> random_-255009774
>>
>>
>> So basically it looks like solr is injecting the "sort=3Drand= om_" stuff into
>> my query and of course that is failing on the search since that >> field/column doesn't exist in my schema. Everytime I run the r= andom
>> function, I get a slightly different field name that it injects, b= ut they
>> all start with "random_" etc.
>>
>> I have tried adding my own sort field instead, hoping solr wouldn&= #39;t inject
>> one for me, but it still injected a random sort fieldname:
>> random(tx_header, q=3D"*:*", rows=3D"100", fl= =3D"countyname", sort=3D"countyname
>> asc")
>>
>>
>> Assuming I can fix that whole problem, my second question is: can = I add
>> multiple "fq=3D" parameters to the random function? I bu= ild a pretty
>> complicated query using many fq=3D fields, and then want to run so= me stats
>> on
>> that hitlist; so somehow I have to pass in the query that made up = the
>> exact
>> hitlist to these various functions, but when I used multiple "= ;fq=3D" values
>> it only seemed to use the last one I specified and just ignored al= l the
>> previous fq's?
>>
>> Thanks in advance for any comments/suggestions...!
>>
>>
>>
>>
>> On Fri, Feb 23, 2018 at 5:59 PM, Joel Bernstein <joelsolr@gmail.com>
>> wrote:
>>
>> > This is going to be a complex answer because Solr actually no= w has
>> multiple
>> > ways of doing regression analysis as part of the Streaming Ex= pression
>> > statistical programming library. The basic documentation is h= ere:
>> >
>> > https://lucene.ap= ache.org/solr/guide/7_2/statistical-programming.html
>> >
>> > Here is a sample expression that performs a simple linear reg= ression in
>> > Solr 7.2:
>> >
>> > let(a=3Drandom(collection1, q=3D"any query", rows= =3D"15000", fl=3D"fieldA,
>> > fieldB"),
>> >=C2=A0 =C2=A0 =C2=A0b=3Dcol(a, fieldA),
>> >=C2=A0 =C2=A0 =C2=A0c=3Dcol(a, fieldB),
>> >=C2=A0 =C2=A0 =C2=A0d=3Dregress(b, c))
>> >
>> >
>> > The expression above takes a random sample of 15000 results f= rom
>> > collection1. The result set will include fieldA and fieldB in= each
>> record.
>> > The result set is stored in variable "a".
>> >
>> > Then the "col" function creates arrays of numbers f= rom the results
>> stored
>> > in variable a. The values in fieldA are stored in the variabl= e "b". The
>> > values in fieldB are stored in variable "c".
>> >
>> > Then the regress function performs a simple linear regression= on arrays
>> > stored in variables "b" and "c".
>> >
>> > The output of the regress function is a map containing the re= gression
>> > result. This result includes RSquared and other attributes of= the
>> > regression model such as R (correlation), slope, y intercept = etc...
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > Joel Bernstein
>> > http://joelsolr.blogspot.com/
>> >
>> > On Fri, Feb 23, 2018 at 3:10 PM, John Smith <localdevjs@gmail.com>
>> wrote:
>> >
>> > > Hi Joel, thanks for the answer. I'm not really a sta= ts guy, but the
>> end
>> > > result of all this is supposed to be obtaining R^2. Is t= here no way of
>> > > obtaining this value, then (short of iterating over all = the results in
>> > the
>> > > hitlist and calculating it myself)?
>> > >
>> > > On Fri, Feb 23, 2018 at 12:26 PM, Joel Bernstein <joelsolr@gmail.com>
>> > > wrote:
>> > >
>> > > > Typically SSE is the sum of the squared errors of t= he prediction in
>> a
>> > > > regression analysis. The stats component doesn'= t perform regression,
>> > > > although it might be a nice feature.
>> > > >
>> > > >
>> > > >
>> > > > Joel Bernstein
>> > > > http://joelsolr.blogspot.com/
>> > > >
>> > > > On Fri, Feb 23, 2018 at 12:17 PM, John Smith <localdevjs@gmail.com>
>> > > wrote:
>> > > >
>> > > > > I'm using solr, and enabling stats as per = this page:
>> > > > > https:= //lucene.apache.org/solr/guide/6_6/the-stats-component.html >> > > > >
>> > > > > I want to get more stat values though. Specifi= cally I'm looking
>> for
>> > > > > r-squared (coefficient of determination). This= value is not
>> present
>> > in
>> > > > > solr, however some of the pieces used to calcu= late r^2 are in the
>> > stats
>> > > > > element, for example:
>> > > > >
>> > > > > <double name=3D"min">0.0</d= ouble>
>> > > > > <double name=3D"max">10.0</= double>
>> > > > > <long name=3D"count">15</lo= ng>
>> > > > > <long name=3D"missing">17</= long>
>> > > > > <double name=3D"sum">85.0</= double>
>> > > > > <double name=3D"sumOfSquares">= 603.0</double>
>> > > > > <double name=3D"mean">5.666666= 666666667</double>
>> > > > > <double name=3D"stddev">2.943920288775949</double>
>> > > > >
>> > > > >
>> > > > > So I have the sumOfSquares available (SST), an= d using this
>> > > calculation, I
>> > > > > can get R^2:
>> > > > >
>> > > > > R^2 =3D 1 - SSE/SST
>> > > > >
>> > > > > All I need then is SSE. Is there anyway I can = get SSE from those
>> > other
>> > > > > stats in solr?
>> > > > >
>> > > > > Thanks in advance!
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

--94eb2c1b4ec89434350566b0c59b-- --94eb2c1b4ec89434380566b0c59c--