lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Get Analyzed/Tokenized Field List
Date Fri, 24 Dec 2010 01:59:37 GMT
I guess I'm missing the point. The fact that it is stored is irrelevant for
searching. Stored
fields really only govern whether Document.getField("fieldname") returns
anything #after#
the search. You can find out if a field is stored-only by asking
IndexReader.getFields
for UNINDEXED, and you can search on anything that is INDEXED.

So if, say, you're creating a drop-down with a selection of fields to choose
from, you
should be able to get the list by looking for INDEXED.

But somewhere you've got to insure that the analyzers used at index time are
identical
or compatible with those used at query time. If all you're concerned is
building up a string
like "+text:stuff +title:nonsense" and handing that off to the app that
knows how the index
was built (so it can use the right analyzers for the text and title fields
when parsing the input)
looking for INDEXED should be fine.

If you're #only# using  your custom analyzer for searchable fields, it's
fine because any INDEXED
field can use the your custom analyzer.

But if you use different analyzers for different searchable fields, there's
no way I know of to
analyze an index and answer the question "what analyzer was this field
created with",
that knowledge is built a-priori into the app as far as I know.


Best
Erick


On Thu, Dec 23, 2010 at 6:32 PM, Jordon Saardchit <jsaardchit@go2.com>wrote:

> The basic use case is determiniation of rules in regards to building a
> query.  I've got an application that programmatically builds queries
> (without any pre existing knowledge of the contents of the index it is
> searching).  We have a custom designed analyzer and filter chain.  However,
> it is applied to certain fields at index time.  The fields it is applied to
> are unstored.
>
> On the search side, I want to be able to determine at runtime which field
> the analyzer should be applied to, and which field not to.  I could be
> approaching the solution incorrectly, but I figured this would be a pretty
> common or natural use case.
>
> Jordon
>
> On Dec 23, 2010, at 2:51 PM, Erick Erickson wrote:
>
> > Ah, you didn't mention indexed but unstored in your original message,
> > just indexed/analyzed....
> >
> > I don't think you can (someone jump in here if I'm wrong, please). The
> > problem
> > is that Lucene doesn't require any sort of schema. So if you are
> perfectly
> > free to
> > store a field in one document and NOT store it in another. All the
> variants
> > specified in IndexReader.fieldOption can quickly be determined by just
> > looking at the
> > various index files. But you'd have to spin through all the #documents#
> in
> > order
> > to answer the question "is this field ever stored?". Sounds like a table
> > scan in the
> > DB world.
> >
> > I don't think Lucene keeps meta-data for this, and spinning through all
> the
> > documents
> > would be expensive...
> >
> > Why do you want to know? Perhaps there's another way to satisfy the
> > use-case.
> >
> > I could be way off base here, I'm speaking from general principles not
> > knowledge of
> > the code...
> >
> > Best
> > Erick
> >
> > On Thu, Dec 23, 2010 at 4:43 PM, Jordon Saardchit <jsaardchit@go2.com
> >wrote:
> >
> >> Yes I have, and after testing each of the various options denoted in
> >> IndexReader.FieldOption, I cannot retrieve fieldnames that are indexed
> >> (analyzed), and unstored.  I figured this would be relatively easy to do
> and
> >> I was simply overlooking something.  Is it perhaps not possible to do
> this?
> >>
> >> Jordon
> >>
> >> On Dec 23, 2010, at 1:30 PM, Erick Erickson wrote:
> >>
> >>> Have you looked at IndexReader.getFieldNames()?
> >>>
> >>> Best
> >>> Erick
> >>>
> >>> On Thu, Dec 23, 2010 at 3:23 PM, Jordon Saardchit <jsaardchit@go2.com
> >>> wrote:
> >>>
> >>>> Is there an easy way to retrieve a collection of fields (or field
> names)
> >>>> that are analyzed/tokenized from any given index?
> >>>>
> >>>> Jordon
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>
> >>>>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message