lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chandan khatua" <chand...@nrifintech.com>
Subject RE: Can not index raw binary data stored in Database in BLOB format.
Date Mon, 24 Feb 2014 10:51:50 GMT
Hi Raymond !

I've data-config.xml like bellow:

<?xml version="1.0" encoding="UTF-8" ?>
<dataConfig>
<dataSource name="db" driver="oracle.jdbc.driver.OracleDriver"
url="jdbc:oracle:thin:@//x.x.x.x:x/d11gr21" user="x" password="x"/>
 <dataSource name="dastream" type="FieldStreamDataSource" />
 <document>
  <entity 
      name="messages" pk=" PK" transformer='DateFormatTransformer'
      query="select * from table1"
      dataSource="db">
	 <field column =" PK" name ="id" />
	 <field column="last_modified"  dateTimeFormat="YYYY-MM-DD
HH24:MI:SS" locale="en" />
    <entity 
        name="message"
        dataSource="dastream"
        processor="TikaEntityProcessor"
        url="message"
        dataField="db.MESSAGE"
		format="text"
        >
		
        <field column="text" name="mxMsg" blob="true"/>
      </entity>
    </entity>
	
 
 </document>
</dataConfig>



This is looks like similar to your configuration. But when xml data are in
BLOB in database, indexing is done. But, when binary data are in BLOB in
database, indexing is NOT done.
Please help.

Thanking you,
-Chandan


-----Original Message-----
From: Raymond Wiker [mailto:rwiker@gmail.com] 
Sent: Monday, February 24, 2014 4:06 PM
To: solr-user@lucene.apache.org
Subject: Re: Can not index raw binary data stored in Database in BLOB
format.

I've done something like this; the key was to use a FieldStreamDataSource to
read from the BLOB field.

Something like

<datasource name="main" ...>
<dataSource type="FieldStreamDataSource" name="fieldstream"/>

then

      <entity name="tika" processor="TikaEntityProcessor"
dataField="main.BLOB" dataSource="fieldstream" format="xml">
        <field column="Author" meta="true" name="..."/>
        <field column="title" meta="true" name="title"/>
        <field column="text" name="content"/>
        <field column="content_type" name="content_type" meta="true"/>
        <field column="last_modified" name="last_modified" meta="true"/>
    </entity>

...




On Mon, Feb 24, 2014 at 11:04 AM, Chandan khatua
<chandank@nrifintech.com>wrote:

> Hi Gora !
>
> Your concern was "What is the type of the column used to store the 
> binary data in Oracle?"
> The column type is BLOB in DB.  The column can also have rich text file.
>
> Regards,
> Chandan
>
>
> -----Original Message-----
> From: Gora Mohanty [mailto:gora@mimirtech.com]
> Sent: Monday, February 24, 2014 3:02 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Can not index raw binary data stored in Database in BLOB 
> format.
>
> On 24 February 2014 12:51, Chandan khatua <chandank@nrifintech.com> wrote:
> > Hi,
> >
> >
> >
> > We have raw binary data stored in database(not word,excel,xml etc
> > files) in BLOB.
> >
> > We are trying to index using TikaEntityProcessor but nothing seems 
> > to get indexed.
> >
> > But the same configuration works when xml/word/excel files are 
> > stored in the BLOB field.
>
> Please start by reviewing 
> http://wiki.apache.org/solr/DataImportHandler as the above seems quite 
> confused. Why are you using TikaEntityProcessor if the data in the DB 
> are not richtext files?
>
> What is the type of the column used to store the binary data in 
> Oracle? You might be able to convert it with a ClobTransformer. Please 
> see http://wiki.apache.org/solr/DataImportHandler#ClobTransformer
>
> http://wiki.apache.org/solr/DataImportHandlerFaq#Blob_values_in_my_tab
> le_are
> _added_to_the_Solr_document_as_object_strings_like_B.401f23c5
>
> Regards,
> Gora
>
>


Mime
View raw message