lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From alessio crisantemi <alessio.crisant...@gmail.com>
Subject Re: indexing with DIH (and with problems)
Date Sun, 12 Feb 2012 18:08:28 GMT
the last version:

with this data-config

<dataConfig>
 <dataSource type="BinFileDataSource" />
 <document>
  <entity
    name="tika-test"
    processor="FileListEntityProcessor"
    baseDir="D:\gioconews_archivio\marzo2011"
    fileName=".*pdf"
    recursive="true"
    rootEntity="false"
    dataSource="null"/>
  <entity processor="FileListEntityProcessor"
url="D:\gioconews_archivio\marzo2011" format="text" >
   <field column="author"  name="author" meta="true"/>
   <field column="title" name="title" meta="true"/>
     <field column="description" name="description" />
     <field column="comments" name="comments" />
     <field column="content_type" name="content_type" />
     <field column="last_modified" name="last_modified" />
  </entity>
 </document>
</dataConfig>

I obtain this result:

 <str name="*command*">*full-import*</str>
  <str name="*status*">*idle*</str>
  <str name="*importResponse*" />
 -<http://pc-alessio:8983/solr/select?clean=false&commit=true&qt=%2Fdataimport&command=full-import#>
<lst name="*statusMessages*">
  <str name="*Time Elapsed*">*0:0:2.44*</str>
  <str name="*Total Requests made to DataSource*">*0*</str>
  <str name="*Total Rows Fetched*">*43*</str>
  <str name="*Total Documents Skipped*">*0*</str>
  <str name="*Full Dump Started*">*2012-02-12 19:06:00*</str>
  <str name="**">*Indexing failed. Rolled back all changes.*</str>
  <str name="*Rolledback*">*2012-02-12 19:06:00*</str>
 </lst>

suggestions?
thank you
a.
2012/2/12 alessio crisantemi <alessio.crisantemi@gmail.com>

> sorry for the confusion:
>
> I forgotted a part of code:
>  <entity name="tika-test" processor="TikaEntityProcessor"
> url="${f.fileAbsolutePath}" format="text">
>
> Withouth this part, The result is the same of previous mail.
>
> If I add this raw, the results is:
>
> -<http://pc-alessio:8983/solr/select?clean=false&commit=true&qt=%2Fdataimport&command=full-import#>
> <lst name="*statusMessages*">
>   <str name="*Time Elapsed*">*0:0:1.79*</str>
>   <str name="*Total Requests made to DataSource*">*0*</str>
>   <str name="*Total Rows Fetched*">*1*</str>
>   <str name="*Total Documents Processed*">*0*</str>
>   <str name="*Total Documents Skipped*">*0*</str>
>   <str name="*Full Dump Started*">*2012-02-12 18:20:49*</str>
>   <str name="**">*Indexing failed. Rolled back all changes.*</str>
>   <str name="*Rolledback*">*2012-02-12 18:20:49*</str>
>  </lst>
>
> help!
> ty
> alessio
>
>
> 2012/2/12 alessio crisantemi <alessio.crisantemi@gmail.com>
>
>> Hi,
>> Now, my DIH run but maybe only partly
>>
>> I indexing a directory containing 43 pdf files.
>> follow, the reply of my FUll-import command:
>>
>>  <str name="*command*">*full-import*</str>
>>   <str name="*status*">*idle*</str>
>>   <str name="*importResponse*" />
>>  -<http://pc-alessio:8983/solr/select?clean=false&commit=true&qt=%2Fdataimport&command=full-import#>
>> <lst name="*statusMessages*">
>>   <str name="*Total Requests made to DataSource*">*0*</str>
>>   <str name="*Total Rows Fetched*">*43*</str>
>>   <str name="*Total Documents Skipped*">*0*</str>
>>   <str name="*Full Dump Started*">*2012-02-12 17:39:10*</str>
>>   <str name="*Total Documents Processed*">*0*</str>
>>   <str name="*Time taken*">*0:0:0.78*</str>
>>   </lst>
>>   <str name="*WARNING*">*This response format is experimental. It is
>> likely to change in the future.*</str>
>>   </response>
>>
>>
>> It's like if my handler see my directory like a 'list of title', I
>> suppose, and not like a series of documents.
>>
>> Is true? And above all: WHY!?!?
>> please, Help me!
>> thank you
>> alessio
>>
>>
>>
>> PS: follow my data-config.xl file: may be is here the problem..
>>
>> <dataConfig>
>>  <dataSource name="dsFiles"
>>   type="FileDataSource"
>>   encoding="UTF-8"/>
>>  <document>
>>   <entity
>>     name="f"
>>     processor="FileListEntityProcessor"
>>     baseDir="D:\gioconews_archivio\marzo2011"
>>     fileName=".*pdf"
>>     recursive="true"
>>     rootEntity="false"
>>     dataSource="null">
>>
>> <entity name="tika-test" processor="TikaEntityProcessor"
>> url="${f.fileAbsolutePath}" format="text">
>>     <field column="author"  name="author" />
>>    <field column="title" name="title" />
>>          <field column="subject" name="subject" />
>>      <field column="description" name="description" />
>>      <field column="comments" name="comments" />
>>      <field column="category" name="categoru" />
>>      <field column="content_type" name="content_type" />
>>      <field column="last_modified" name="last_modified" />
>>   </entity>
>>   </entity>
>>
>>  </document>
>> </dataConfig>
>>
>> 2012/2/12 alessio crisantemi <alessio.crisantemi@gmail.com>
>>
>>> Dear Shawn,
>>> thanks for your reply.
>>> but my contrib directory of Solr 3.5 dont' contain this .jar files
>>> (apache-solr-dataimporthandler-3.5-SNAPSHOT.jar and
>>> apache-solr-dataimporthandler-extras-3.5-SNAPSHOT.jar)
>>>
>>> I have only apache-solr-dataimporthandler-3.5.jar and
>>> apache-solr-dataimporthandler-extras-3.5.jar, so, WITHOUTH 'snapshot'.
>>> Why? Where I can download this jar files?
>>> a.
>>>
>>> 2012/2/12 Shawn Heisey <solr@elyograg.org>
>>>
>>>> On 2/11/2012 4:33 AM, alessio crisantemi wrote:
>>>>
>>>>> dear all,
>>>>> I update my solr at 3.5 version but now I have this problem:
>>>>>
>>>>> Grave: Full Import failed
>>>>> org.apache.solr.handler.**dataimport.**DataImportHandlerException:
>>>>> java.lang.NoSuchMethodError:
>>>>>
>>>>
>>>> The data import handler has always been a contrib module, but it used
>>>> to be actually included in the .war file.  That has been changed, now it's
>>>> in separate jar files.
>>>>
>>>> When you downloaded or compiled 3.5.0, the dist directory should have
>>>> contained dataimporthandler and dataimporthandler-extras jar files.  Mine,
>>>> which I have compiled myself from the 3.5 svn branch, are named the
>>>> following:
>>>>
>>>> apache-solr-dataimporthandler-**3.5-SNAPSHOT.jar
>>>> apache-solr-dataimporthandler-**extras-3.5-SNAPSHOT.jar
>>>>
>>>> At minimum, put the first jar file in a lib folder referenced in your
>>>> solrconfig.xml file.  I couldn't tell you whether you'll need the -extras
>>>> file as well, you'll have to experiment.
>>>>
>>>> Thanks,
>>>> Shawn
>>>>
>>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message