lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: ExtractRequestHandler - not properly indexing office docs?
Date Tue, 23 Jun 2009 11:00:11 GMT
Can you change the text field to be stored and then point the  
LukeRequestHandler at that field (/admin/luke) and report back?  Also,  
can you post your full schema and config?

Finally, can you get the example to work?


On Jun 23, 2009, at 1:41 AM, cloax wrote:

>
> I've tried 'text' ( taken from the example config ) and then tried  
> creating a
> new field called doc_content and using that. Neither has worked.
>
>
> Grant Ingersoll-6 wrote:
>>
>> What's your default search field?
>>
>> On Jun 22, 2009, at 12:29 PM, cloax wrote:
>>
>>>
>>> Yep, I've tried both of those and still no joy. Here's both my curl
>>> statement
>>> and the resulting Solr log output.
>>>
>>> curl
>>> http://localhost:8983/solr/update/extract?ext.def.fl=text
>>> \&ext.literal.id=1\&ext.map.div=text\&ext.capture=div
>>> -F "myfile=@dj_character.doc"
>>>
>>> Curls output:
>>> <?xml version="1.0" encoding="UTF-8"?>
>>> <response>
>>> <lst name="responseHeader"><int name="status">0</int><int
>>> name="QTime">317</int></lst>
>>> </response>
>>>
>>> Solr log:
>>> Jun 22, 2009 12:21:42 PM org.apache.solr.core.SolrCore execute
>>> INFO: [] webapp=/solr path=/update/extract
>>> params
>>> ={ext.map.div=text&ext.def.fl=text&ext.capture=div&ext.literal.id=1}
>>> status=0 QTime=544
>>> Jun 22, 2009 12:22:26 PM
>>> org.apache.solr.update.processor.LogUpdateProcessor
>>> finish
>>> INFO: {add=[1]} 0 317
>>> Jun 22, 2009 12:22:26 PM org.apache.solr.core.SolrCore execute
>>> INFO: [] webapp=/solr path=/update/extract
>>> params
>>> ={ext.map.div=text&ext.def.fl=text&ext.capture=div&ext.literal.id=1}
>>> status=0 QTime=317
>>> Jun 22, 2009 12:22:37 PM org.apache.solr.core.SolrCore execute
>>> INFO: [] webapp=/solr path=/select
>>> params
>>> =
>>> {wt
>>> =
>>> standard
>>> &rows
>>> =
>>> 10
>>> &start
>>> =
>>> 0
>>> &explainOther
>>> =&hl.fl=&indent=on&q=kondel&fl=*,score&qt=standard&version=2.2}
>>> hits=0 status=0 QTime=2
>>>
>>> The submitted document has "kondel" in it numerous times, so Solr
>>> should
>>> have a hit. Yet it returns nothing. I also made sure I committed,
>>> but that
>>> didn't seem to help either.
>>>
>>>
>>> Grant Ingersoll-6 wrote:
>>>>
>>>> Do you have a default field declared?  &ext.default.fl=<FIELD NAME>
>>>> Either that, or you need to explicitly capture the fields you are
>>>> interested in using &ext.capture=<FIELD NAME>
>>>>
>>>> You could add this to your curl statement to try out.
>>>>
>>>> -Grant
>>>>
>>>
>>>
>>> -- 
>>> View this message in context:
>>> http://www.nabble.com/ExtractRequestHandler---not-properly-indexing-office-docs--tp24120125p24150763.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>
>> --------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>>
>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
>> using Solr/Lucene:
>> http://www.lucidimagination.com/search
>>
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/ExtractRequestHandler---not-properly-indexing-office-docs--tp24120125p24159267.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search


Mime
View raw message