lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: Lucene in the Humanities
Date Sat, 19 Feb 2005 13:37:12 GMT
On Saturday 19 February 2005 11:02, Erik Hatcher wrote:
> 
> On Feb 19, 2005, at 3:52 AM, Paul Elschot wrote:
> >>> By lowercasing the querytext and searching in title_lc ?
> >>
> >> Well sure, but how about this query:
> >>
> >> 	title:Something AND anotherField:someOtherValue
> >>
> >> QueryParser, as-is, won't be able to do field-name swapping.  I could
> >> certainly apply that technique on all the structured queries that I
> >> build up with the API, but with QueryParser it is trickier.   I'm
> >> definitely open for suggestions on improving how case is handled.  The
> >
> > Overriding this (1.4.3 QueryParser.jj, line 286) might work:
> >
> > protected Query getFieldQuery(String field, String queryText)
> > throws ParseException { ... }
> >
> > It will be called by the parser for both parts of the query above, so 
> > one
> > could change the field depending on the requested type of search
> > and the field name in the query.
> 
> But that wouldn't work for any other type of query.... 
> title:somethingFuzzy~

To get that it would be necessary to override all query parser
methods that take a field argument.

> 
> Though now that I think more about it, a simple s/title:/title_orig:/ 
> before parsing would work, and of course make the default field 

In the overriding getFieldQuery method something like:

if (caseSensitiveSearch(field) && originalFieldIndexed(field)) {
  field = field + "_orig";
} else { //the other 3 cases
 ...
}
return super.getFieldQuery(field, queryText);

The if statement could be factored out for the other overriding methods.

> dynamic.   I need to evaluate how many fields would need to be done 
> this way - it'd be several.  Thanks for the food for thought!
> 
> >> only drawback now is that I'm duplicating indexes, but that is only an
> >> issue in how long it takes to rebuild the index from scratch 
> >> (currently
> >> about 20 minutes or so on a good day - when the machine isn't 
> >> swamped).
> >
> > Once the users get the hang of this, you might end up having to 
> > quadruple
> > the index, or more.
> 
> Why would that be?   They want a case sensitive/insensitive switch.  
> How would it expand beyond that?

With an index for every combination of fields and case sensitivity for these
fields.

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message