lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Get Analyzed/Tokenized Field List
Date Fri, 24 Dec 2010 20:17:48 GMT
Well, not to my knowledge. In fact there's no guarantee that the #same#
index
has the #same# analyzer used on the #same# field in different documents, so
I don't
see how there could be a robust implementation of what you want.

You could populate a field with a particular analyzer (or none at all),
close your writer and open another with any other random analyzer (or
none at all) for the same field and Lucene wouldn't complain.

Solr handles this with the schema file. I guess you could abstract the
field definitions into a library and use the library in both apps, but
otherwise
the apps have to "just know".

Best
Erick

On Fri, Dec 24, 2010 at 1:16 PM, Jordon Saardchit <jsaardchit@go2.com>wrote:

> Heh, yes, all stuff I know.  My question was if an index contained any meta
> data which revealed whether or not a certain indexed field had been analyzed
> or not, which I think you are saying it does not.
>
> Our searching and indexing is isolated into 2 completely seperate packages
> which can be deployed independantly of each other.  The only common
> dependency (obviously) is the index itself.  That being said, I was trying
> to determine from the search runtime if the given fieldname/input pair
> should be analyzed or not when building the query without having any
> knowledge of how the index was created.
>
> Jordon
>
> On Dec 23, 2010, at 5:59 PM, Erick Erickson wrote:
>
> > I guess I'm missing the point. The fact that it is stored is irrelevant
> for
> > searching. Stored
> > fields really only govern whether Document.getField("fieldname") returns
> > anything #after#
> > the search. You can find out if a field is stored-only by asking
> > IndexReader.getFields
> > for UNINDEXED, and you can search on anything that is INDEXED.
> >
> > So if, say, you're creating a drop-down with a selection of fields to
> choose
> > from, you
> > should be able to get the list by looking for INDEXED.
> >
> > But somewhere you've got to insure that the analyzers used at index time
> are
> > identical
> > or compatible with those used at query time. If all you're concerned is
> > building up a string
> > like "+text:stuff +title:nonsense" and handing that off to the app that
> > knows how the index
> > was built (so it can use the right analyzers for the text and title
> fields
> > when parsing the input)
> > looking for INDEXED should be fine.
> >
> > If you're #only# using  your custom analyzer for searchable fields, it's
> > fine because any INDEXED
> > field can use the your custom analyzer.
> >
> > But if you use different analyzers for different searchable fields,
> there's
> > no way I know of to
> > analyze an index and answer the question "what analyzer was this field
> > created with",
> > that knowledge is built a-priori into the app as far as I know.
> >
> >
> > Best
> > Erick
> >
> >
> > On Thu, Dec 23, 2010 at 6:32 PM, Jordon Saardchit <jsaardchit@go2.com
> >wrote:
> >
> >> The basic use case is determiniation of rules in regards to building a
> >> query.  I've got an application that programmatically builds queries
> >> (without any pre existing knowledge of the contents of the index it is
> >> searching).  We have a custom designed analyzer and filter chain.
>  However,
> >> it is applied to certain fields at index time.  The fields it is applied
> to
> >> are unstored.
> >>
> >> On the search side, I want to be able to determine at runtime which
> field
> >> the analyzer should be applied to, and which field not to.  I could be
> >> approaching the solution incorrectly, but I figured this would be a
> pretty
> >> common or natural use case.
> >>
> >> Jordon
> >>
> >> On Dec 23, 2010, at 2:51 PM, Erick Erickson wrote:
> >>
> >>> Ah, you didn't mention indexed but unstored in your original message,
> >>> just indexed/analyzed....
> >>>
> >>> I don't think you can (someone jump in here if I'm wrong, please). The
> >>> problem
> >>> is that Lucene doesn't require any sort of schema. So if you are
> >> perfectly
> >>> free to
> >>> store a field in one document and NOT store it in another. All the
> >> variants
> >>> specified in IndexReader.fieldOption can quickly be determined by just
> >>> looking at the
> >>> various index files. But you'd have to spin through all the #documents#
> >> in
> >>> order
> >>> to answer the question "is this field ever stored?". Sounds like a
> table
> >>> scan in the
> >>> DB world.
> >>>
> >>> I don't think Lucene keeps meta-data for this, and spinning through all
> >> the
> >>> documents
> >>> would be expensive...
> >>>
> >>> Why do you want to know? Perhaps there's another way to satisfy the
> >>> use-case.
> >>>
> >>> I could be way off base here, I'm speaking from general principles not
> >>> knowledge of
> >>> the code...
> >>>
> >>> Best
> >>> Erick
> >>>
> >>> On Thu, Dec 23, 2010 at 4:43 PM, Jordon Saardchit <jsaardchit@go2.com
> >>> wrote:
> >>>
> >>>> Yes I have, and after testing each of the various options denoted in
> >>>> IndexReader.FieldOption, I cannot retrieve fieldnames that are indexed
> >>>> (analyzed), and unstored.  I figured this would be relatively easy to
> do
> >> and
> >>>> I was simply overlooking something.  Is it perhaps not possible to do
> >> this?
> >>>>
> >>>> Jordon
> >>>>
> >>>> On Dec 23, 2010, at 1:30 PM, Erick Erickson wrote:
> >>>>
> >>>>> Have you looked at IndexReader.getFieldNames()?
> >>>>>
> >>>>> Best
> >>>>> Erick
> >>>>>
> >>>>> On Thu, Dec 23, 2010 at 3:23 PM, Jordon Saardchit <
> jsaardchit@go2.com
> >>>>> wrote:
> >>>>>
> >>>>>> Is there an easy way to retrieve a collection of fields (or
field
> >> names)
> >>>>>> that are analyzed/tokenized from any given index?
> >>>>>>
> >>>>>> Jordon
> >>>>>>
> ---------------------------------------------------------------------
> >>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>
> >>>>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message