lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: multi-field index and search (Not MultiFieldQuery). Help setting up index and search
Date Mon, 04 May 2009 20:32:52 GMT
In the case of such queries with keywords (not analyzed tokens), I would
create directly the appropinquate TermQuerys and combine with BooleanQuery.
QueryParser is normally not for program-internal queries, more for queries
the user has entered. For your use-case, it seems better to just create the
correct Query Objects using standard instantiation from the Java code.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: Monday, May 04, 2009 9:51 PM
> To: java-user@lucene.apache.org
> Subject: Re: multi-field index and search (Not MultiFieldQuery). Help
> setting up index and search
> 
> MultiFieldQuery essentially (if I have this right) forms a "cross
> product".
> I.e.
> it is NOT required to specify specific values for discrete fields. MFQ
> helps
> form queries expressing something like "does any term appear in any field
> in a hit" or "Does every term appear in some field of a hit, regardless of
> which
> field and not necessarily the same field" (depending upon whether the
> default operator is OR or AND). You can get something of the same effect
> by creating a special field that is the concatenation of all the other
> fields
> and searching that concatenated field with and/or (except that MFQ does
> interesting things with boosting).
> 
> But if you know exactly what terms you require in which field, the
> standard query parser is fine. i.e. +material:leather +gender:female
> will look for "leather" ONLY in material and "female" ONLY in gender.
> 
> HTH
> Erick.
> 
> P.S. A tip for you: If anything I say contradicts something Paul says,
> listen to Paul <G>...
> 
> 
> On Mon, May 4, 2009 at 3:33 PM, Christian Bongiorno
> <christian@bongiorno.org
> > wrote:
> 
> > Yeah, you definitely got the idea. You're the second person to recommend
> > putting each item in it's own document and just store the HTS code
> (which
> > is
> > easy for me). The HTS code actually comes with no extra info. I mean,
> there
> > is info, but we don't store any of it.
> >
> > I will try as you and Paul have recommended. Once done, then I would
> need a
> > MultiFieldQuery? Forgive me but the queries confuse me.
> >
> > Rebuilding my index will take some time, but I appreciate everyone's
> help
> >
> > Christian
> >
> > On Mon, May 4, 2009 at 11:40 AM, Erick Erickson <erickerickson@gmail.com
> > >wrote:
> >
> > > Hmmmm, tricky. Let's see if I understand your problem.
> > >
> > > Basically, you have a bunch of HSTs that have had
> > > some number of items arbitrarily assigned to them, and
> > > you want to see if you can make Lucene behave as a kind
> > > of expert system to help you classify the next item.
> > >
> > > I *think* you'd get better results by indexing each item
> > > along with its HST code as a separate document. Because
> > > what you really want to ask is "given the attributes of my
> > > new item, what other item is "most similar" to it and then
> > > present the HSTs from these items to the classifier
> > > (perhaps a person?).
> > >
> > > I'm going to assume further that the HST code has
> > > some data associated with it that describes the
> > > class, and that these need to be available to
> > > the user to see if your suggestions are appropriate.
> > > You could either index the HSTs in another index
> > > OR index them in the same index but simply store
> > > the data (don't index it) and the HST documents won't
> > > interfere with your searches on "similar items".
> > >
> > > Mostly, this is just trying to see if I understand what
> > > you're trying to accomplish. This may be gibberish, but
> > > it's a start <G>.
> > >
> > > Best
> > > Erick
> > >
> > >
> > > On Mon, May 4, 2009 at 1:16 PM, Christian Bongiorno <
> > > christian@bongiorno.org
> > > > wrote:
> > >
> > > > I am trying to build a search (have been experimenting with using
> > Lucene)
> > > > and someone suggested contacting your team
> > > >
> > > > Background:
> > > > Currently the service I am working on applies taxing/duties to
> products
> > > for
> > > > international shipping by looking up something called an HTS code (a
> > > > universally recognized taxation code for duty/tariff). We already
> have
> > > > almost a million items classified by HTS code. As many as 50k items
> > fall
> > > > into the same HTS code.
> > > >
> > > > For purposes of HTS classification
> > > > Description is only important if no other field exists. But taxation
> is
> > > > based on things like material (leather, cloth, etc) and product
> > > > (shoes/bags/toys). Color is of fair relevancy as well (to a customs
> > > > official
> > > > black boots or brown make no difference; it wasn't made here so it
> must
> > > be
> > > > taxed)
> > > >
> > > > The idea is to turn our entire existing knowledge base into an
> index,
> > > then
> > > > when we get a new item that needs classification, we "search" for
> the
> > > > "Document(hts)" that best matches by using the new item attributes
> for
> > > the
> > > > item to be classified as the search query.
> > > >
> > > > The document structure, as I see it, should be:
> > > >
> > > > Document(HTS) -> {{ASIN1: {Key,value},{Key,value},.}, {ASIN2:
> > > > {Key,value},{Key,value},.} .}
> > > >
> > > > There are 1788 documents. Up to 50k ASINs and their attributes may
> fall
> > > > into
> > > > a single document.
> > > >
> > > > On some fields, they are straightforward and very good indicators of
> > > match.
> > > > Such as
> > > >
> > > > Material -> "leather"
> > > > Gender -> "women"
> > > >
> > > > Others are fuzzier
> > > >
> > > > Description -> "Stylish full calf leather boots. Sleek Italian
> leather,
> > > > designer"
> > > >
> > > > So for a query of:
> > > > "Material" -> "Leather"
> > > > "Gender" -> "womAn"
> > > > "Description" -> "Short leather shoes, Made in Denmark"
> > > >
> > > > I would expect a very high match here since the first 2 fields,
> which
> > > don't
> > > > vary much, are good indicators for HTS.
> > > >
> > > > I have searched through the archives and I don't see anything like
> what
> > I
> > > > am
> > > > looking for.
> > > >
> > > > Basically, every item will have attributes which I am treating as
> > > > "Field(item.key, item.value)". I think that's the right approach but
> > > > multi-field query queries your terms across all fields in the
> search.
> > > That
> > > > isn't what I need. I very clearly know my fields and values and that
> > > should
> > > > give me enormous leverage when querying if I could build a query to
> do
> > > that
> > > >
> > > >
> > > > Christian
> > > >
> > > > --
> > > > Christian Bongiorno
> > > >
> > >
> >
> >
> >
> > --
> > Christian Bongiorno
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message