Return-Path: Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: (qmail 28140 invoked from network); 5 Feb 2010 09:30:28 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 5 Feb 2010 09:30:28 -0000 Received: (qmail 26654 invoked by uid 500); 5 Feb 2010 09:30:26 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 26583 invoked by uid 500); 5 Feb 2010 09:30:26 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 26573 invoked by uid 99); 5 Feb 2010 09:30:26 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Feb 2010 09:30:26 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of noble.paul@gmail.com designates 209.85.222.175 as permitted sender) Received: from [209.85.222.175] (HELO mail-pz0-f175.google.com) (209.85.222.175) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Feb 2010 09:30:18 +0000 Received: by pzk5 with SMTP id 5so86386pzk.29 for ; Fri, 05 Feb 2010 01:29:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:reply-to:received :in-reply-to:references:from:date:x-google-sender-auth:message-id :subject:to:content-type:content-transfer-encoding; bh=pFfEloZUisM6dk28IcP/6IkRBxasFkreGBpwQhArGOo=; b=IafBTGDGXSCYUNUy4U0k/VORpM5R0H01fraie2CChI9KTLEWKEqWvVx3Gq0Rfq0E3x s2MFN3uEO6tIW/gF8YKLCiGIwTuDVxCyaWgc/HVfW4x3PUbGuILncQxFQq4qC2tHPpjR uSRV/O3B5WicdDDAqHYMNaLUE2pQVy3dkinjQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:reply-to:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:content-type :content-transfer-encoding; b=IQ0Xai/mGFcHV9+xZxUtZZCSB+AdtmeGxzZ2geIidHEiZFDlCSVgHhLVnjre4nWjg8 PBJ4Wx8ddOrNSg/Go32Kv07PeeMajQSavUNqU1y+BvFnh1/Hbmvb6sTWfmS1HU365Ltg ymafz+DpzFlNoNgxviGfjaZ1hxSf4M2iU/SRw= MIME-Version: 1.0 Sender: noble.paul@gmail.com Reply-To: noble.paul@gmail.com Received: by 10.141.53.7 with SMTP id f7mr1657799rvk.118.1265362197161; Fri, 05 Feb 2010 01:29:57 -0800 (PST) In-Reply-To: <132770721002050053l151478q531e81f689d59d20@mail.gmail.com> References: <5e76b0ad1001262324p467c0391v9e4fa8560afda2ac@mail.gmail.com> <132770721002040757r570ce0d4g74c816800cfc3910@mail.gmail.com> <5e76b0ad1002042038l5254e178u6e912e9496764073@mail.gmail.com> <132770721002050053l151478q531e81f689d59d20@mail.gmail.com> From: =?UTF-8?B?Tm9ibGUgUGF1bCDgtKjgtYvgtKzgtL/gtLPgtY3igI0gIOCkqOCli+CkrOCljeCks+CljQ==?= Date: Fri, 5 Feb 2010 14:59:37 +0530 X-Google-Sender-Auth: bfeacf6fb32284dd Message-ID: <5e76b0ad1002050129t57fe2488w78ee4a771432d9da@mail.gmail.com> Subject: Re: DataImportHandler TikaEntityProcessor FieldReaderDataSource To: solr-user@lucene.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org unfortunately, no On Fri, Feb 5, 2010 at 2:23 PM, Jorg Heymans wrote= : > dow, thanks for that Paul :-| > > I suppose schema validation for data-config.xml is already in Jira somewh= ere > ? > > Jorg > > 2010/2/5 Noble Paul =E0=B4=A8=E0=B5=8B=E0=B4=AC=E0=B4=BF=E0=B4=B3=E0=B5= =8D=E2=80=8D =E0=A4=A8=E0=A5=8B=E0=A4=AC=E0=A5=8D=E0=A4=B3=E0=A5=8D > >> wrong =C2=A0 >> right =C2=A0 =C2=A0 >> >> On Thu, Feb 4, 2010 at 9:27 PM, Jorg Heymans >> wrote: >> > Hi, >> > I'm having some troubles getting this to work on a snapshot from 3rd f= eb >> =C2=A0My >> > config looks as follows >> > =C2=A0 =C2=A0 > /> >> > =C2=A0 =C2=A0 >> > =C2=A0 =C2=A0 >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 > > bytes from documents" > >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 > > url=3D"bytes" dataField=3D"meta.BYTES"> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> > =C2=A0 =C2=A0 =C2=A0 >> > and i get this stacktrace >> > org.apache.solr.handler.dataimport.DataImportHandlerException: Unable = to >> > execute query: bytes Processing Document # 1 >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 at >> > >> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThr= ow(DataImportHandlerException.java:72) >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 at >> > >> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:253) >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 at >> > >> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource= .java:210) >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 at >> > >> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource= .java:39) >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 at >> > >> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntit= yProcessor.java:98) >> > It seems that whatever is in the url attribute it is trying to execute= as >> a >> > query. So i thought i put url=3D"select bytes from documents where id = =3D >> > ${meta.ID}" but then i get a classcastexception. >> > Caused by: java.lang.ClassCastException: >> > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1 >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 at >> > >> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntit= yProcessor.java:98) >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 at >> > >> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Entity= ProcessorWrapper.java:233) >> > Any ideas what is wrong with the config ? >> > Thanks >> > Jorg >> > 2010/1/27 Noble Paul =E0=B4=A8=E0=B5=8B=E0=B4=AC=E0=B4=BF=E0=B4=B3=E0= =B5=8D=E2=80=8D =E0=A4=A8=E0=A5=8B=E0=A4=AC=E0=A5=8D=E0=A4=B3=E0=A5=8D >> >> >> >> There is no corresponding DataSurce which can be used with >> >> TikaEntityProcessor which reads from BLOB >> >> I have opened an issue.https://issues.apache.org/jira/browse/SOLR-173= 7 >> >> >> >> On Mon, Jan 25, 2010 at 10:57 PM, Shah, Nirmal >> wrote: >> >> > Hi, >> >> > >> >> > >> >> > >> >> > I am fairly new to Solr and would like to use the DIH to pull rich >> text >> >> > files (pdfs, etc) from BLOB fields in my database. >> >> > >> >> > >> >> > >> >> > There was a suggestion made to use the FieldReaderDataSource with t= he >> >> > recently commited TikaEntityProcessor. =C2=A0Has anyone accomplishe= d this? >> >> > >> >> > This is my configuration, and the resulting error - I'm not sure if >> I'm >> >> > using the FieldReaderDataSource correctly. =C2=A0If anyone could sh= ed light >> >> > on whether I am going the right direction or not, it would be >> >> > appreciated. >> >> > >> >> > >> >> > >> >> > ---------------Data-config.xml: >> >> > >> >> > >> >> > >> >> > =C2=A0 >> >> > >> >> > =C2=A0 > >> > url=3D"jdbc:oracle:thin:un/pw@host:1521:sid" /> >> >> > >> >> > =C2=A0 =C2=A0 =C2=A0 >> >> > >> >> > =C2=A0 =C2=A0 =C2=A0> name, >> >> > attachment from testtable2"> >> >> > >> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 > >> > dataField=3D"attach.attachment" format=3D"text"> >> >> > >> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> >> > >> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> >> > >> >> > =C2=A0 =C2=A0 =C2=A0 >> >> > >> >> > =C2=A0 >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > -------------Debug error: >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > 0 >> >> > >> >> > 203 >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > testdb-data-config.xml >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > full-import >> >> > >> >> > debug >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > select id as name, attachment from testtable2 >> >> > >> >> > 0:0:0.32 >> >> > >> >> > ----------- row #1------------- >> >> > >> >> > java.math.BigDecimal:2 >> >> > >> >> > oracle.sql.BLOB:oracle.sql.BLOB@1c8e807 >> >> > >> >> > --------------------------------------------- >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > org.apache.solr.handler.dataimport.DataImportHandlerException: No >> >> > dataSource :f1 available for entity :253433571801723 Processing >> Document >> >> > # 1 >> >> > >> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at >> >> > >> org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(Da >> >> > taImporter.java:279) >> >> > >> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at >> >> > >> org.apache.solr.handler.dataimport.ContextImpl.getDataSource(ContextImpl >> >> > .java:93) >> >> > >> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at >> >> > >> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntit >> >> > yProcessor.java:97) >> >> > >> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at >> >> > >> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Entity >> >> > ProcessorWrapper.java:237) >> >> > >> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at >> >> > >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j >> >> > ava:357) >> >> > >> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at >> >> > >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j >> >> > ava:383) >> >> > >> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at >> >> > >> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java >> >> > :242) >> >> > >> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at >> >> > >> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:18 >> >> > 0) >> >> > >> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at >> >> > >> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporte >> >> > r.java:331) >> >> > >> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at >> >> > >> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java >> >> > :389) >> >> > >> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at >> >> > >> org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(D >> >> > ataImportHandler.java:203) >> >> > >> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at >> >> > >> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB >> >> > ase.java:131) >> >> > >> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at >> >> > org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) >> >> > >> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at >> >> > >> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja >> >> > va:338) >> >> > >> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at >> >> > >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j >> >> > ava:241) >> >> > >> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at >> >> > >> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHan >> >> > dler.java:1089) >> >> > >> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at >> >> > >> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) >> >> > >> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at >> >> > >> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:2 >> >> > 16) >> >> > >> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at >> >> > >> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) >> >> > >> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at >> >> > >> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) >> >> > >> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at >> >> > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:40= 5) >> >> > >> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at >> >> > >> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandler >> >> > Collection.java:211) >> >> > >> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at >> >> > >> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.jav >> >> > a:114) >> >> > >> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at >> >> > >> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) >> >> > >> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at org.mortb= ay.jetty.Server.handle(Server.java:285) >> >> > >> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at >> >> > >> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) >> >> > >> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at >> >> > >> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConne >> >> > ction.java:821) >> >> > >> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at >> >> > org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) >> >> > >> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at >> >> > org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) >> >> > >> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at >> >> > org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) >> >> > >> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at >> >> > >> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.jav >> >> > a:226) >> >> > >> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at >> >> > >> org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.ja >> >> > va:442) >> >> > >> >> > >> >> > >> >> > Thanks, >> >> > >> >> > Nirmal >> >> > >> >> > >> >> >> >> >> >> >> >> -- >> >> ----------------------------------------------------- >> >> Noble Paul | Systems Architect| AOL | http://aol.com >> > >> > >> >> >> >> -- >> ----------------------------------------------------- >> Noble Paul | Systems Architect| AOL | http://aol.com >> > --=20 ----------------------------------------------------- Noble Paul | Systems Architect| AOL | http://aol.com