lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Re-tokenized fields disappear
Date Tue, 07 Oct 2008 12:52:24 GMT
This is going to get really sticky given StandardAnalyzer. Let's say that
you have
codesearch:B05 1
codesearch:B05 2
codesearch:B05 3

When you index these, you'll index tokens B05, 1, 2, 3, along with
positional information. How to say "between 1 and 3" becomes a problem,
although it *might* work for you to search for
+codesearch:B05 +codesearch:[1 TO 3]...
(I've forgotten the between syntax, but you get the idea).
But I think that'll kinda work until you encounter case n + 1.......

But all is not lost. You might be well served by indexing these with
something like KeywordAnalyzer (note, you might want to roll your
own analyzer that, for instance, uppercases before passing to
KeywordAnalyzer).. Then, in the example above you'd index the following
tokens:

B05 1
B05 2
B05 3


Now, you can search for tokens between "B05 1" and "B05 3" using
the normal syntax syntax.

There are alternative schemes, but I think you would get some mileage
out of thinking about how to index these creatively.

A few points:
1> You may have to carefully massage the data for searching. For instance,
     assume that one of your tokens was B05 100. You might want to
     index "B05 001" rather than "B05 1" since Lucene normally
     searches lexically rather than numerically.
2> This could be a special search field that's a copy of your original. That
is,
     you would have two fields where before you had one.
3> PerFieldAnalyzerWrapper is your friend, both at index and search time
<G>.

Best
Erick

On Mon, Oct 6, 2008 at 11:38 PM, John Griffin <jgriffin@thebluezone.net>wrote:

> My previous question may be moot but as is it is still a problem. Here's a
> little more info on my problem. The same named fields contain two pieces of
> information, a code "B05" and a value "1" as follows. The value can be a
> range such as 1 to 5 or 1 to 100.
>
>
>
> "codesearch", "B05 1"
>
>
>
> This field and other identically names but differently valued fields in the
> same document are related to a specific person as identified by another
> field say SSN. So, one person can have multiple code searches. Both of the
> codesearch values are related to one another and must be searchable such as
>
>
>
> Return all persons with a codesearch value of B05 ranging from 1 to 3.
>
>
>
> How can I go about this? Do these codesearch fields need to be in a
> separate
> index related by SSN?
>
>
>
> Thanks in advance.
>
>
>
> John G.
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message