lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: not indexing analyzed field
Date Sat, 27 Nov 2010 10:34:36 GMT
You have to first understand the difference between "stored" and "indexed":

- For stored fields no analysis is done, as they are only stored (e.g. for
display of retrieval results). These are simply copied unchanged to the
index - and you cannot search on them.
- Analysis is done on the "index" side, so the text is split up into tokens.
So you won't see analysis occurring on the stored field contents (e.g. when
you display the results using Lucene's IndexReader.document(int) or Solr's
result structure). The indexed fields are used when you query the index. For
queries to work, the search query is also analyzed and split up into tokens.
These tokens are searched in the "index". This also implies, that the query
and index analyzer needs to be compatible. And that’s another problem in
your schema, you don't have an query analyzer! So your searches would never
hit any result!

I'd suggest to read a book about Lucene/Solr first :-)

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Bernd Fehling [mailto:bernd.fehling@uni-bielefeld.de]
> Sent: Friday, November 26, 2010 8:23 AM
> To: java-user@lucene.apache.org
> Subject: Re: not indexing analyzed field
> 
> Hi Uwe,
> 
> my fieldType and fields are as follows:
> 
> <fieldType name="text_md" class="solr.TextField" omitNorms="true" >
>   <analyzer type="index"
> class="de.ubbielefeld.solr.analysis.TextMessageDigestAnalyzer" />
</fieldType>
> 
> <!-- UNIQUE ID -->
> <field name="id" type="string" indexed="true" stored="true"
required="true" />
> <field name="dcdocid" type="text_md" indexed="true" stored="true" />
> <copyField source="id" dest="dcdocid" />
> 
> So the field dcdocid has the attribute *stored* which I can also see in
the
> debugger.
> Why should I analyze a stored field?
> I don't know if I need to analyze it, I also tried a filter but also no
success.
> 
> My understanding is to send something to a field and the field has a
processing
> chain. The processing chain analyzes, filters, ... is doing something to
the
> content and then stores the content to that field in the index.
> 
> May be it is a misunderstanding on my side about the field based
processing
> because I'm normally working with FAST search engines which is document
> based.
> 
> Best regards
> Bernd
> 
> 
> 
> Am 25.11.2010 18:33, schrieb Uwe Schindler:
> > field.fieldsData is used for the stored field contents and so only
> > *stored* in index, of course not analyzed (why should I analyze a
> > stored field). The indexed tokens go of course through your analyzer
> > and the returned tokens are indexed as terms. Where is the problem?
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> >
> >> -----Original Message-----
> >> From: Bernd Fehling [mailto:bernd.fehling@uni-bielefeld.de]
> >> Sent: Thursday, November 25, 2010 2:08 PM
> >> To: java-user@lucene.apache.org
> >> Subject: not indexing analyzed field
> >>
> >> I used KeywordAnalyzer and KeywordTokenizer as templates for a new
> >> analyzer.
> >> The analyzer works fine but the result never reaches the index.
> >>
> >> My analyzer is called in "DocInverterPerField.processFields"
> >> with "stream.incrementToken()".
> >> ...
> >> try {
> >>     boolean hasMoreTokens = stream.incrementToken();
> >>
> >>     fieldState.attributeSource = stream;
> >>
> >>     OffsetAttribute offsetAttribute =
> >> fieldState.attributeSource.addAttribute(OffsetAttribute.class);
> >>     PositionIncrementAttribute posIncrAttribute =
> >> fieldState.attributeSource.addAttribute(PositionIncrementAttribute.cl
> >> ass);
> >>
> >>     consumer.start(field);
> >> ...
> >>
> >> The result goes to "fieldState.attributeSource" but is not in "field".
> >> So "field.fieldsData" still has the old content before calling my
> > analyzer. And
> >> when calling "consumer.start(field)" the old content is going to the
> >> index
> > and
> >> not the new analyzed one.
> >> Does the analyzer has to care about "Fieldable field.fieldsData"
> >> or who is responsible for it?
> >>
> >> Regards
> >> Bernd
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> 
> --
> *************************************************************
> Bernd Fehling                Universitätsbibliothek Bielefeld
> Dipl.-Inform. (FH)                        Universitätsstr. 25
> Tel. +49 521 106-4060                   Fax. +49 521 106-4052
> bernd.fehling@uni-bielefeld.de                33615 Bielefeld
> 
> BASE - Bielefeld Academic Search Engine - www.base-search.net
> *************************************************************
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message