lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Search for null value or empty string in lucene
Date Wed, 06 Dec 2006 13:26:40 GMT
Perhaps the most straight-forward way would be to index a known unique value
for each document that would have had a null entry. Conceptually, when a
field would be null, index the value "nothinghere". Then you can just search
on documents where the value is equal to "nothinghere".

Alternatively, you could create a filter using TermEnum/TermDocs for each
document that had an entry in the field in question and then invert it to
get your real filter. The trick here is to build your term something like
new Term("field", ""); which enumerates all the terms in a field. Since a
filter is just a bitset, there are operations for inverting it. You can even
store these filters somewhere and reuse them (note, they will have to be
re-built when you change your index). For 2 million documents, they would be
quite small, somewhere around 250K bytes. Building these filters is quite
fast, so the first thing I would try is building them dynamically. If you
use a CachingWrapperFilter, they will be cached automatically for the
duration of your program.


But I'd go with the simple way first, that of indexing a unique value for
the null fields and searching on that....

Best
Erick

On 12/6/06, Supriya Kumar Shyamal <supriya.shyamal@artnology.com> wrote:
>
> Hi,
>
> I have some question regarding the search. In a document we can have
> several fields but not all fields have the value in all documents i.e.
> some fields in a document can have null or empty string. But how to
> search for a null field value in a document using the IndexSearcher? Any
> idea will be very greaful. Since I have index with 2 million documents
> adn Ic an't see thorgh all documents usign Luke.
>
> Thanks,
> supriya
>
> --
> Mit freundlichen Grüßen / Regards
>
> Supriya Kumar Shyamal
>
> Software Developer
> tel +49 (30) 443 50 99 -22
> fax +49 (30) 443 50 99 -99
> email supriya.shyamal@artnology.com
> ___________________________
> artnology GmbH
> Milastr. 4
> 10437 Berlin
> ___________________________
>
> http://www.artnology.com
> __________________________________________________________________________
>
> News / Aktuelle Projekte:
> * artnology gewinnt Ausschreibung des Bundesministeriums des Innern:
>    Softwarelösung für die Verwaltung der Sammlung zeitgenössischer
>    Kunstwerke zur kulturellen Repräsentation des Bundes.
>
> Projektreferenzen:
> * Globaler eShop und Corporate-Site für Springer: www.springeronline.com
> * E-Detailing-Portal für Novartis: www.interaktiv.novartis.de
> * Service-Center-Plattform für Biogen: www.ms-life.de
> * eCRM-System für Grünenthal: www.gruenenthal.com
>
>
> ___________________________________________________________________________
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message