lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raymond Wiker <rwi...@gmail.com>
Subject Re: Can not index raw binary data stored in Database in BLOB format.
Date Mon, 24 Feb 2014 12:17:50 GMT
Try running the query for the outer entity ("messages") in an sql client,
and verify that your blob column is called MESSAGE.


On Mon, Feb 24, 2014 at 12:22 PM, Chandan khatua <chandank@nrifintech.com>wrote:

> I've tried as per your guide. But, no data are indexing.
> The output of Query screen looks like :
>
> <doc>
>     <str name="id">2158</str>
>     <arr name="mxMsg">
>       <str><?xml version="1.0" encoding="UTF-8"?><html
> xmlns="http://www.w3.org/1999/xhtml">
> <head>
> <meta name="Content-Type" content="application/octet-stream"/>
> <title/>
> </head>
> <body/></html></str>
>     </arr>
>     <long name="_version_">1460918369230258176</long></doc>
>
>
>
> But, the indexed data should be displayed within  <body> tag. When xml
> message are stored in DB in BLOB type, then indexing is done smoothly.
> But, I am trying to index binary data which are stored in DB in BLOB type.
>
> Need help.
>
> Thanking you,
> Chandan
>
>
>
> -----Original Message-----
> From: Raymond Wiker [mailto:rwiker@gmail.com]
> Sent: Monday, February 24, 2014 4:38 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Can not index raw binary data stored in Database in BLOB
> format.
>
> Try replacing the inner entity with something like
>
> <entity name="message"
>            dataSource="dastream"
>            processor="TikaEntityProcessor"
>            dataField="messages.MESSAGE"
>            format="xml">
>     <field column="text" name="mxMsg"/>
>   </entity>
>
> --- this assumes that you get the blob from a column named "MESSAGE" in the
> outer entity ("messages").
>
>
> On Mon, Feb 24, 2014 at 11:51 AM, Chandan khatua
> <chandank@nrifintech.com>wrote:
>
> > Hi Raymond !
> >
> > I've data-config.xml like bellow:
> >
> > <?xml version="1.0" encoding="UTF-8" ?> <dataConfig> <dataSource
> > name="db" driver="oracle.jdbc.driver.OracleDriver"
> > url="jdbc:oracle:thin:@//x.x.x.x:x/d11gr21" user="x" password="x"/>
> > <dataSource name="dastream" type="FieldStreamDataSource" />
> > <document>
> >   <entity
> >       name="messages" pk=" PK" transformer='DateFormatTransformer'
> >       query="select * from table1"
> >       dataSource="db">
> >          <field column =" PK" name ="id" />
> >          <field column="last_modified"  dateTimeFormat="YYYY-MM-DD
> > HH24:MI:SS" locale="en" />
> >     <entity
> >         name="message"
> >         dataSource="dastream"
> >         processor="TikaEntityProcessor"
> >         url="message"
> >         dataField="db.MESSAGE"
> >                 format="text"
> >         >
> >
> >         <field column="text" name="mxMsg" blob="true"/>
> >       </entity>
> >     </entity>
> >
> >
> >  </document>
> > </dataConfig>
> >
> >
> >
> > This is looks like similar to your configuration. But when xml data
> > are in BLOB in database, indexing is done. But, when binary data are
> > in BLOB in database, indexing is NOT done.
> > Please help.
> >
> > Thanking you,
> > -Chandan
> >
> >
> > -----Original Message-----
> > From: Raymond Wiker [mailto:rwiker@gmail.com]
> > Sent: Monday, February 24, 2014 4:06 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Can not index raw binary data stored in Database in BLOB
> > format.
> >
> > I've done something like this; the key was to use a
> > FieldStreamDataSource to read from the BLOB field.
> >
> > Something like
> >
> > <datasource name="main" ...>
> > <dataSource type="FieldStreamDataSource" name="fieldstream"/>
> >
> > then
> >
> >       <entity name="tika" processor="TikaEntityProcessor"
> > dataField="main.BLOB" dataSource="fieldstream" format="xml">
> >         <field column="Author" meta="true" name="..."/>
> >         <field column="title" meta="true" name="title"/>
> >         <field column="text" name="content"/>
> >         <field column="content_type" name="content_type" meta="true"/>
> >         <field column="last_modified" name="last_modified" meta="true"/>
> >     </entity>
> >
> > ...
> >
> >
> >
> >
> > On Mon, Feb 24, 2014 at 11:04 AM, Chandan khatua
> > <chandank@nrifintech.com>wrote:
> >
> > > Hi Gora !
> > >
> > > Your concern was "What is the type of the column used to store the
> > > binary data in Oracle?"
> > > The column type is BLOB in DB.  The column can also have rich text
> file.
> > >
> > > Regards,
> > > Chandan
> > >
> > >
> > > -----Original Message-----
> > > From: Gora Mohanty [mailto:gora@mimirtech.com]
> > > Sent: Monday, February 24, 2014 3:02 PM
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: Can not index raw binary data stored in Database in
> > > BLOB format.
> > >
> > > On 24 February 2014 12:51, Chandan khatua <chandank@nrifintech.com>
> > wrote:
> > > > Hi,
> > > >
> > > >
> > > >
> > > > We have raw binary data stored in database(not word,excel,xml etc
> > > > files) in BLOB.
> > > >
> > > > We are trying to index using TikaEntityProcessor but nothing seems
> > > > to get indexed.
> > > >
> > > > But the same configuration works when xml/word/excel files are
> > > > stored in the BLOB field.
> > >
> > > Please start by reviewing
> > > http://wiki.apache.org/solr/DataImportHandler as the above seems
> > > quite confused. Why are you using TikaEntityProcessor if the data in
> > > the DB are not richtext files?
> > >
> > > What is the type of the column used to store the binary data in
> > > Oracle? You might be able to convert it with a ClobTransformer.
> > > Please see
> > > http://wiki.apache.org/solr/DataImportHandler#ClobTransformer
> > >
> > > http://wiki.apache.org/solr/DataImportHandlerFaq#Blob_values_in_my_t
> > > ab
> > > le_are
> > > _added_to_the_Solr_document_as_object_strings_like_B.401f23c5
> > >
> > > Regards,
> > > Gora
> > >
> > >
> >
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message