lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Searching by bit masks
Date Mon, 27 Nov 2006 20:00:50 GMT
Well, you really have the code already <G>. From the top...

1> there's no good way to support searching bitfields If you wanted, you
could probably store it as a small integer and then search on it, but that's
waaay too complicated than you want.

2> Add the fields like you have the snippet from, something like
Document doc = new Document.
if (bitsfromdb & 1) {
    doc.add("sport", "y");
}
if (bitsfromdb & 2) {
    doc.add("music", "y");
}
.
.
.
IndexWriter.add(doc);



Now, when searching, search on things like new Term("sport", "y"))..... and
you'll only get the documents that correspond to the 2s bit being set.

Watch out for capitalization. Y may not be equivalent to y. It depends on
the analyzer you use at index AND search time.

You can or as many of these together as you want. In your example, you could
have up to 4 sub-clauses just for the bitmask-equivalents.

NOTE: the documents won't all have the same fields. A document may not have,
for instance, the "sports" field. This is OK in Lucene, but not the first
thing folks with their DB hat on think of....

Get a copy of Luke (google lucene luke) and get familiar with it for
examining your index and the effects of various analyzers. Really, really,
really get a copy of Luke. Really.

Do you have a copy of "Lucene In Action"? If not, I highly recommend it. It
has tons of useful examples as well as a good introduction to many of the
concepts. It's written to the 1.4 codebase, so be warned that there are some
incompatibilities that are, for the most part, minor.


Best
Erick
On 11/27/06, Biggy <biggy97@web.de> wrote:
>
>
>
> i have the same problem here. I have an interest bit field, which i
> receive from the applciation backend. I have control over how the
> docuemtns
> are built.
> To be specific, the field looks like this:
>
> ID: interest
> 1 : sport
> 2 : music
> 4 : film
> 8 : clubs
>
> So someone interested in sports and music can be found by "interest & 3"
> =>
> e.g. when using SQL.
>
> I do not wish to Post-Filter the results
> On to Lucene, Is there a filter which supports this kind of query ?
>
> Someone suggested splitting the bits into fields:
> > Document doc = new Document();
> > doc.add("flag1", "Y");
> > doc.add("flag2", "Y");
> > IndexWriter.add(doc);
> Is this helpful at all ?
>
> Code would be helpful too as i am a newbie
>
>
>
> ltaylor.employon wrote:
> >
> > Hello,
> >
> > I am currently evaluating Lucene to see if it would be appropriate to
> > replace my company's current search software. So far everything has been
> > looking great, however there is one requirement that I am not too
> > certain about.
> >
> > What we need to do is to be able to store a bit mask specifying various
> > filter flags for a document in the index and then search this field by
> > specifying another bit mask with desired filters, returning documents
> > that have any of the specified flags set. In other words, we are doing a
> > bitwise OR on the stored filter bit mask and the specified filter bit
> > mask and if it is non-zero, we want to return the document.
> >
> > Before I started toying around with various options myself, I wanted to
> > see if any of you good folks in the Lucene community had some
> > suggestions for an efficient way to implement this.
> >
> > We currently need to index ~8,000,000 documents. We have several filter
> > flag fields, the most important of which currently has 7 possible flags
> > with any combination of the flags being valid. The number of flags is
> > expected to increase rather rapidly in the near future.
> >
> > My preemptive thanks for your suggestions,
> >
> >
> > Lawrence Taylor
> > Senior Software Engineer
> > Employon
> > Message was edited by: ltaylor.employon
> >
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Searching-by-bit-masks-tf2603918.html#a7564237
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message