From solr-user-return-150637-archive-asf-public=cust-asf.ponee.io@lucene.apache.org Thu Nov 14 13:09:47 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 8AD2E180607 for ; Thu, 14 Nov 2019 14:09:47 +0100 (CET) Received: (qmail 98121 invoked by uid 500); 14 Nov 2019 13:09:42 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 98103 invoked by uid 99); 14 Nov 2019 13:09:41 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Nov 2019 13:09:41 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 3F2F71A429A for ; Thu, 14 Nov 2019 13:09:41 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.201 X-Spam-Level: X-Spam-Status: No, score=0.201 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=companywatch-net.20150623.gappssmtp.com Received: from mx1-he-de.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id L2cHic-ligNu for ; Thu, 14 Nov 2019 13:09:36 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2607:f8b0:4864:20::844; helo=mail-qt1-x844.google.com; envelope-from=bchazalet@companywatch.net; receiver= Received: from mail-qt1-x844.google.com (mail-qt1-x844.google.com [IPv6:2607:f8b0:4864:20::844]) by mx1-he-de.apache.org (ASF Mail Server at mx1-he-de.apache.org) with ESMTPS id 22D2A7DC1E for ; Thu, 14 Nov 2019 13:09:36 +0000 (UTC) Received: by mail-qt1-x844.google.com with SMTP id o3so6668579qtj.8 for ; Thu, 14 Nov 2019 05:09:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=companywatch-net.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=E6hcWpygsFxkxdOX+5+ZD+vhGSoBljUzliaxm9nWufs=; b=GS3PPZjAZYdNodGl3um2IdWRjwU0NRo6/ok3I6oO2xtDECtl7Eb1AwFPMPKHY7OLHn 0kdo9pCSzilFOwSSZey6zJGFVbbVPDnzDvcZe0zdyRcguQLPApwLNP7b64dfXayUYRTm z8khfXhGIN3n39QMn0htcFzBz61hwWBJDA1XpCRDWexPpqg/47GcO+FaHt0t/Aud5KMX BQbOd9gmk+1HzqLdXtpDdu0hUgnojLvfxVia63/bevtjDjBgw4MM3hhFJ/iZQ7jbGhpr IOUfFCPOi91NaR77YveXuIxa0uq98J1feq8AaXIa5Sg6Wuu8+Y2JZ68n9v/zPAmOeFlh V51Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=E6hcWpygsFxkxdOX+5+ZD+vhGSoBljUzliaxm9nWufs=; b=XT9ujjDkVrnczlnd4+FAhEOKvSRrHxl14aQrScWGQacFOe1+h86pkJ7cXdIWT5D6er G4MymdeCKxjQAsFuOmDyXF/JSum60mxlpgMS5KGdxGS9wJkHLNzAmt6W1K22yQyo0m6A uAlfLS7Em56QTPqnrbf3fXD84B9joUWoOM5IT6+bxCfHr09XUjIl7gXoHk3rg0+/44Gu 5Uk5qmvJEdQMUPScKOvbqBxzmfi1bjZANQAKElpwLCiCOhtyr2EZ4diPTT3GmF1UqUDb hva1yk9akPWDZGXpLH6bnVFnm636Zg3wu8/UYw9nT/IDQgdGKYEuT1xDXuziEgWS8FB4 8sFQ== X-Gm-Message-State: APjAAAUcGPdqwkzgQ9M87djnjkPHDicyx6fxNIOdsltrAlPH7ZYuhMBi 5yPI7OMDSHb/SH4YBirqc+UEPt1QDq9tMsl48LcroxcHA2mlPLScNFh25Z7vSLIWUgZ+FVAkqDQ Lr7ARkpJahcJulB2YdCSaHfPsdXiWUTPnYi5vcw== X-Google-Smtp-Source: APXvYqxtoOhQRAQinS3JeEzbHNzIn7LBbhA6UsTBFY/0mTlVWjdlyEOEAgVVTzFxBIWrWhBLDuI3FPE7NRBPJMVjlps= X-Received: by 2002:ac8:3209:: with SMTP id x9mr8070895qta.293.1573736968321; Thu, 14 Nov 2019 05:09:28 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Boris Chazalet Date: Thu, 14 Nov 2019 14:09:16 +0100 Message-ID: Subject: Re: 8.3.0: Invalid UUID String while indexing document with a UUID field To: solr-user@lucene.apache.org Content-Type: multipart/related; boundary="000000000000f905d105974e3097" --000000000000f905d105974e3097 Content-Type: multipart/alternative; boundary="000000000000f905cf05974e3096" --000000000000f905cf05974e3096 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I dug a little in the dataimport code, and there's a special case for BigDecimal in the JdbcDataSource class, here exactly: https://github.com/apache/lucene-solr/blob/faaee86efb01fa6e431fcb129cfb956c= 7d62d514/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/da= taimport/JdbcDataSource.java#L403 I believe we might need the same kind of logic for a UUID object coming directly from the jdbc driver. On Thu, 14 Nov 2019 at 13:46, Boris Chazalet wrote: > Thanks for your response J=C3=B6rn. Yes, I saw the prefix and I suspect t= his is > the problem. But I'm not doing anything special in the DIH config, this i= s > a minimalized version of it: > > > driver=3D"org.postgresql.Driver" > url=3D"jdbc:postgresql://redacted" > readOnly=3D"false" autoCommit=3D"false" > transactionIsolation=3D"TRANSACTION_READ_COMMITTED" > holdability=3D"CLOSE_CURSORS_AT_COMMIT" application_name=3D"solr" > prepareThreshold=3D"0" /> > > query=3D" > SELECT myuuidfield, mypk > FROM {{solr__datasource}} > WHERE > '${dataimporter.request.clean}' !=3D 'false' OR > lastmodified > '${dataimporter.last_index_time}'::timestamp - '3 > hours'::interval > "> > > > > > > > The field is a UUID in the database, so it's definitely valid and without > prefix. Where can I double check for myself of the DataImportHandler > seralises an UUID in the code? > > > > On Thu, 14 Nov 2019 at 13:38, J=C3=B6rn Franke wro= te: > >> It seems there is a prefix java.util.UUID: in front of your UUID. Any >> idea where it comes from? Is it also like this in the database? Is your >> import handler maybe receiving a java object java.util.UUID and it is no= t >> converted correctly to string? >> >> > Am 14.11.2019 um 11:52 schrieb Boris Chazalet < >> bchazalet@companywatch.net>: >> > >> > =EF=BB=BF >> > >> > Hi, >> > >> > I'm running into an issue with Solr 8.3.0: it fails at indexing a >> schema with UUID field. >> > >> > I'm using a SolrCloud setup with 3 instances, and I'm using the DIH to >> fetch and index the data from a postgres database. >> > >> > In schema.xml I have: >> > >> > > indexed=3D"true" stored=3D"true" multiValued=3D"false" required=3D"false= "/> >> > >> > >> > The data-config is a simple select, the uuid field is of UUID type in >> postgres. >> > >> > I was running 7.7.2 until now, and first noticed the problem there. Bu= t >> given the number of things around UUIDs fixed in the latest version, I >> thought I'd try 8.3.0 first. The same problem arises while running the D= IH. >> > >> > Note that I have another core with a uuid field, which I am indexing >> externally (i.e. not from the DIH) and I haven't had a problem there, so >> I'm suspecting the problem might be in the DIH logic, but with no certai= nty. >> > >> > Below is a truncated version of exception's stacktrace I see in the >> logs. I can provide the full one if necessary. Is this a legitimate bug? >> What can I do to help tracking down the problem? >> > >> > 2019-11-13 17:29:55.430 ERROR (qtp1990098664-15) [c:db_c s:shard1 >> r:core_node5 x:db_c_shard1_replica_n2] o.a.s.s.HttpSolrCall >> null:org.apache.solr.common.SolrException: ERROR: [doc=3D1] Error adding >> field 'myuuidfield'=3D'java.util.UUID:afa9cf35-0b2d-e811-89a7-0025900429= ba' >> msg=3DError while creating field >> 'myuuidfield{type=3Duuid,properties=3Dindexed,stored,omitNorms,omitTermF= reqAndPositions,useDocValuesAsStored}' >> from value 'java.util.UUID:afa9cf35-0b2d-e811-89a7-0025900429ba' >> > at >> org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:2= 15) >> > at >> org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateComma= nd.java:109) >> > at >> org.apache.solr.update.DirectUpdateHandler2.updateDocOrDocValues(DirectU= pdateHandler2.java:969) >> > ... >> > Caused by: org.apache.solr.common.SolrException: Error while creating >> field >> 'myuuidfield{type=3Duuid,properties=3Dindexed,stored,omitNorms,omitTermF= reqAndPositions,useDocValuesAsStored}' >> from value 'java.util.UUID:4ee3992e-0b2d-e811-89a7-0025900429ba' >> > at >> org.apache.solr.schema.FieldType.createField(FieldType.java:291) >> > at >> org.apache.solr.schema.StrField.createFields(StrField.java:48) >> > at >> org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:65) >> > at >> org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:1= 71) >> > ... 69 more >> > Caused by: org.apache.solr.common.SolrException: Invalid UUID String: >> 'java.util.UUID:4ee3992e-0b2d-e811-89a7-0025900429ba' >> > at >> org.apache.solr.schema.UUIDField.toInternal(UUIDField.java:88) >> > at >> org.apache.solr.schema.FieldType.createField(FieldType.java:289) >> > ... 72 more >> > >> > Kind Regards, >> > Boris >> > >> > This message is intended only for the addressee and unless otherwise >> stated is commercial in confidence and may contain information that is >> privileged. Where all recipients are in the companywatch.net domain, >> this communication is classified as Confidential. Unauthorised use is >> strictly prohibited and may be unlawful. If you are not the addressee, y= ou >> should not read, copy, disclose or otherwise use this message, except fo= r >> the purpose of delivery to the addressee. If you have received this in >> error, please delete and advise us immediately. Although Company Watch >> makes every reasonable effort to keep its network and systems free from >> viruses, the company accepts no responsibility for computer viruses >> transmitted through this mail or in any attachments. It is your >> responsibility to virus scan any attachments we send to you. >> > >> > Company Watch Limited is a company registered in England & Wales with >> company number 3597613 >> > Centurion House, 37 Jewry Street, London, EC3N 2ER >> > >> > Please consider the environment before printing this email >> > >> > > > -- > > > *Boris Chazalet*Senior developer and problem solver > [image: Co_watch_signature] > T: +44 (0)20 3740 9402 > E: bchazalet@companywatch.net > > > > > --=20 *Boris Chazalet*Senior developer and problem solver [image: Co_watch_signature] T: +44 (0)20 3740 9402 E: bchazalet@companywatch.net --=20 This message is intended only for the addressee and unless otherwise stated=20 is commercial in confidence and may contain information that is privileged.= =C2=A0 =C2=A0Where all recipients are in the companywatch.net =20 domain, this communication is classified as Confidential.=C2=A0=C2=A0Unauth= orised use=20 is strictly prohibited and may be unlawful.=C2=A0If you are not the addressee,=20 you should not read, copy, disclose or otherwise use this message, except=20 for the purpose of delivery to the addressee.=C2=A0If you have received this in=20 error, please delete and advise us immediately.=C2=A0Although Company Watch=20 makes every reasonable effort to keep its network and systems free from=20 viruses, the company accepts no responsibility for computer viruses=20 transmitted through this mail or in any attachments.=C2=A0It is your=20 responsibility to virus scan any attachments we send to you.=C2=A0 Company=20 Watch Limited is a company registered in England & Wales with company=20 number 3597613 Centurion House, 37 Jewry Street, London, EC3N 2ER Please consider the environment before printing this email=C2=A0=C2=A0 --000000000000f905cf05974e3096 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thanks for your response J=C3=B6rn. Yes, I sa= w the prefix and I suspect this is the problem. But I'm not doing anyth= ing special in the DIH config, this is a minimalized version of it:

<= dataConfig>
=C2=A0 =C2=A0 <dataSource type=3D"JdbcDataSource&= quot;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 driver=3D&= quot;org.postgresql.Driver"
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 url=3D"jdbc:postgresql://redacted"
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 readOnly=3D"false&quo= t; autoCommit=3D"false" transactionIsolation=3D"TRANSACTION_= READ_COMMITTED" holdability=3D"CLOSE_CURSORS_AT_COMMIT" appl= ication_name=3D"solr" prepareThreshold=3D"0" />
= =C2=A0 =C2=A0 <document>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 <entity na= me=3D"index" pk=3D"cinumber" transformer=3D"RegexT= ransformer"
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= query=3D"
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = SELECT myuuidfield, mypk
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 FROM {{solr__datasource}}
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 WHERE
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 '${dataimporter.request.clean}' !=3D '= false' OR lastmodified =C2=A0> '${dataimporter.last_index_time}&= #39;::timestamp - '3 hours'::interval
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ">
=C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 <field column=3D"myuuidfield" name=3D"myuui= dfield" />
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 <field c= olumn=3D"mypk" name=3D"mypk" />
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 </entity>
=C2=A0 =C2=A0 </document>
</dataC= onfig>

The field is a UUID in the database, so it's definitel= y=C2=A0valid and without prefix. Where can I double check for myself of the= DataImportHandler seralises an UUID=C2=A0in the code?


=
On Thu= , 14 Nov 2019 at 13:38, J=C3=B6rn Franke <jornfranke@gmail.com> wrote:
It seems there is a prefix = java.util.UUID: in front of your UUID. Any idea where it comes from? Is it = also like this in the database? Is your import handler maybe receiving a ja= va object java.util.UUID and it is not converted correctly to string?

> Am 14.11.2019 um 11:52 schrieb Boris Chazalet <bchazalet@companywatch.net&= gt;:
>
> =EF=BB=BF
>
> Hi,
>
> I'm running into an issue with Solr 8.3.0: it fails at indexing a = schema with UUID field.
>
> I'm using a SolrCloud setup with 3 instances, and I'm using th= e DIH to fetch and index the data from a postgres database.
>
> In schema.xml I have:
>
>=C2=A0 =C2=A0 =C2=A0<field name=3D"myuuidfield" type=3D&qu= ot;uuid" uninvertible=3D"false" indexed=3D"true" s= tored=3D"true" multiValued=3D"false" required=3D"f= alse"/>
>=C2=A0 =C2=A0 =C2=A0<fieldType name=3D"uuid" class=3D"= ;solr.UUIDField"/>
>
> The data-config is a simple select, the uuid field is of UUID type in = postgres.
>
> I was running 7.7.2 until now, and first noticed the problem there. Bu= t given the number of things around UUIDs fixed in the latest version, I th= ought I'd try 8.3.0 first. The same problem arises while running the DI= H.
>
> Note that I have another core with a uuid field, which I am indexing e= xternally (i.e. not from the DIH) and I haven't had a problem there, so= I'm suspecting the problem might be in the DIH logic, but with no cert= ainty.
>
> Below is a truncated version of exception's stacktrace I see in th= e logs. I can provide the full one if necessary. Is this a legitimate bug? = What can I do to help tracking down the problem?
>
> 2019-11-13 17:29:55.430 ERROR (qtp1990098664-15) [c:db_c s:shard1 r:co= re_node5 x:db_c_shard1_replica_n2] o.a.s.s.HttpSolrCall null:org.apache.sol= r.common.SolrException: ERROR: [doc=3D1] Error adding field 'myuuidfiel= d'=3D'java.util.UUID:afa9cf35-0b2d-e811-89a7-0025900429ba' msg= =3DError while creating field 'myuuidfield{type=3Duuid,properties=3Dind= exed,stored,omitNorms,omitTermFreqAndPositions,useDocValuesAsStored}' f= rom value 'java.util.UUID:afa9cf35-0b2d-e811-89a7-0025900429ba'
> at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.j= ava:215)
> at org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdate= Command.java:109)
> at org.apache.solr.update.DirectUpdateHandler2.updateDocOrDocValues(Di= rectUpdateHandler2.java:969)
> ...
> Caused by: org.apache.solr.common.SolrException: Error while creating = field 'myuuidfield{type=3Duuid,properties=3Dindexed,stored,omitNorms,om= itTermFreqAndPositions,useDocValuesAsStored}' from value 'java.util= .UUID:4ee3992e-0b2d-e811-89a7-0025900429ba'
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at org.apache.solr.schema.FieldType.c= reateField(FieldType.java:291)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at org.apache.solr.schema.StrField.cr= eateFields(StrField.java:48)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at org.apache.solr.update.DocumentBui= lder.addField(DocumentBuilder.java:65)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at org.apache.solr.update.DocumentBui= lder.toDocument(DocumentBuilder.java:171)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0... 69 more
> Caused by: org.apache.solr.common.SolrException: Invalid UUID String: = 'java.util.UUID:4ee3992e-0b2d-e811-89a7-0025900429ba'
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at org.apache.solr.schema.UUIDField.t= oInternal(UUIDField.java:88)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at org.apache.solr.schema.FieldType.c= reateField(FieldType.java:289)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0... 72 more
>
> Kind Regards,
> Boris
>
> This message is intended only for the addressee and unless otherwise s= tated is commercial in confidence and may contain information that is privi= leged.=C2=A0 Where all recipients are in the companywatch.net domain, thi= s communication is classified as Confidential.=C2=A0 Unauthorised use is st= rictly prohibited and may be unlawful. If you are not the addressee, you sh= ould not read, copy, disclose or otherwise use this message, except for the= purpose of delivery to the addressee. If you have received this in error, = please delete and advise us immediately. Although Company Watch makes every= reasonable effort to keep its network and systems free from viruses, the c= ompany accepts no responsibility for computer viruses transmitted through t= his mail or in any attachments. It is your responsibility to virus scan any= attachments we send to you.
>
> Company Watch Limited is a company registered in England & Wales w= ith company number 3597613
> Centurion House, 37 Jewry Street, London, EC3N 2ER
>
> Please consider the environment before printing this email=C2=A0
>


--
<= div dir=3D"ltr">

Boris Chazalet
Senior devel= oper and problem solver
3D"Co_watch_signature"<= br>T:=C2=A0+44 (0)20 3740 9402
E: bchazalet@companywatch.net






--

Boris Chazalet
Senior developer and problem solver
3D"Co_watch_signature"
T:=C2=A0= +44 (0)20 3740 9402
E: bchazalet@companywatch.net




This message is intended only for the addressee and unless otherwise stated is commercial i= n confidence and may contain information that is privileged.=C2=A0=C2=A0Where all recipients are in the companywatch.net domain, this communication is classifie= d as Confidential.=C2=A0=C2=A0Unautho= rised use is strictly prohibited and may be unlawful.=C2=A0If you are not the addressee, you should not read, copy, disclose or otherwise use this message, except for the purpose of delivery = to the addressee.=C2=A0If you have received this in error, please delete and advise us immediately.=C2=A0Although Company Watch makes every reasonable effort to keep its network and systems free from viruses, the company accep= ts no responsibility for computer viruses transmitted through this mail or in = any attachments.=C2=A0It is your responsibility to virus scan any attachments w= e send to you.=C2=A0

Company Watch Limited is a company registered in England & Wales with company number 3597613
Centurion House, 37 Jewry Street, London, EC3N 2ER

Please consider the environment before printing this email
=C2=A0=C2=A0


--000000000000f905cf05974e3096-- --000000000000f905d105974e3097--