lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bernd Fehling <bernd.fehl...@uni-bielefeld.de>
Subject Re: not indexing analyzed field
Date Mon, 29 Nov 2010 07:02:46 GMT
Hi Uwe,

Erick explained it pretty well and i got it now, but generally you're right RTFM ;-)

Nevertheless I'm in the need of the functionality to change the stored
value during analysis or tokenization or filtering (what ever works).

Thats how it can be done in FAST FDS/ESP (full processing) compared
to Lucene/Solr (sparse processing).
Sure I can make a branch of the trunk and enhance the
"DocInverterPerField.processFields" class/method and change the line
"boolean hasMoreTokens = stream.incrementToken();"
but my hope is still that it is possible without touching the basic
Lucene code.

Do you have any idea, how to change the stored value during analysis or
tokenization or filtering?

Best regards,
Bernd


Am 27.11.2010 11:34, schrieb Uwe Schindler:
> You have to first understand the difference between "stored" and "indexed":
> 
> - For stored fields no analysis is done, as they are only stored (e.g. for
> display of retrieval results). These are simply copied unchanged to the
> index - and you cannot search on them.
> - Analysis is done on the "index" side, so the text is split up into tokens.
> So you won't see analysis occurring on the stored field contents (e.g. when
> you display the results using Lucene's IndexReader.document(int) or Solr's
> result structure). The indexed fields are used when you query the index. For
> queries to work, the search query is also analyzed and split up into tokens.
> These tokens are searched in the "index". This also implies, that the query
> and index analyzer needs to be compatible. And that’s another problem in
> your schema, you don't have an query analyzer! So your searches would never
> hit any result!
> 
> I'd suggest to read a book about Lucene/Solr first :-)
> 
> Uwe
> 
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
> 
>> -----Original Message-----
>> From: Bernd Fehling [mailto:bernd.fehling@uni-bielefeld.de]
>> Sent: Friday, November 26, 2010 8:23 AM
>> To: java-user@lucene.apache.org
>> Subject: Re: not indexing analyzed field
>>
>> Hi Uwe,
>>
>> my fieldType and fields are as follows:
>>
>> <fieldType name="text_md" class="solr.TextField" omitNorms="true" >
>>   <analyzer type="index"
>> class="de.ubbielefeld.solr.analysis.TextMessageDigestAnalyzer" />
> </fieldType>
>>
>> <!-- UNIQUE ID -->
>> <field name="id" type="string" indexed="true" stored="true"
> required="true" />
>> <field name="dcdocid" type="text_md" indexed="true" stored="true" />
>> <copyField source="id" dest="dcdocid" />
>>
>> So the field dcdocid has the attribute *stored* which I can also see in
> the
>> debugger.
>> Why should I analyze a stored field?
>> I don't know if I need to analyze it, I also tried a filter but also no
> success.
>>
>> My understanding is to send something to a field and the field has a
> processing
>> chain. The processing chain analyzes, filters, ... is doing something to
> the
>> content and then stores the content to that field in the index.
>>
>> May be it is a misunderstanding on my side about the field based
> processing
>> because I'm normally working with FAST search engines which is document
>> based.
>>
>> Best regards
>> Bernd
>>
>>
>>
>> Am 25.11.2010 18:33, schrieb Uwe Schindler:
>>> field.fieldsData is used for the stored field contents and so only
>>> *stored* in index, of course not analyzed (why should I analyze a
>>> stored field). The indexed tokens go of course through your analyzer
>>> and the returned tokens are indexed as terms. Where is the problem?
>>>
>>> -----
>>> Uwe Schindler
>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>> http://www.thetaphi.de
>>> eMail: uwe@thetaphi.de
>>>
>>>
>>>> -----Original Message-----
>>>> From: Bernd Fehling [mailto:bernd.fehling@uni-bielefeld.de]
>>>> Sent: Thursday, November 25, 2010 2:08 PM
>>>> To: java-user@lucene.apache.org
>>>> Subject: not indexing analyzed field
>>>>
>>>> I used KeywordAnalyzer and KeywordTokenizer as templates for a new
>>>> analyzer.
>>>> The analyzer works fine but the result never reaches the index.
>>>>
>>>> My analyzer is called in "DocInverterPerField.processFields"
>>>> with "stream.incrementToken()".
>>>> ...
>>>> try {
>>>>     boolean hasMoreTokens = stream.incrementToken();
>>>>
>>>>     fieldState.attributeSource = stream;
>>>>
>>>>     OffsetAttribute offsetAttribute =
>>>> fieldState.attributeSource.addAttribute(OffsetAttribute.class);
>>>>     PositionIncrementAttribute posIncrAttribute =
>>>> fieldState.attributeSource.addAttribute(PositionIncrementAttribute.cl
>>>> ass);
>>>>
>>>>     consumer.start(field);
>>>> ...
>>>>
>>>> The result goes to "fieldState.attributeSource" but is not in "field".
>>>> So "field.fieldsData" still has the old content before calling my
>>> analyzer. And
>>>> when calling "consumer.start(field)" the old content is going to the
>>>> index
>>> and
>>>> not the new analyzed one.
>>>> Does the analyzer has to care about "Fieldable field.fieldsData"
>>>> or who is responsible for it?
>>>>
>>>> Regards
>>>> Bernd
>>>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message