lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "German Kondolf" <german.kond...@gmail.com>
Subject Re: Deleting from Index by URL field: is it safe?
Date Sat, 29 Nov 2008 00:32:56 GMT
It works exactly as it does when you search of that term.

Review in your index creation, if you store it without analyzing it
(Index.UN_TOKENIZED), it will only match that document when you have an
exact URL.

It's possible that the URL is not unique enought in your domain, there is no
other unique identifier that you could use?

I suggest you create a test and try it on a RAMDirectory and see exactly
what happens and what you want!

Regards,
German K.

On Fri, Nov 28, 2008 at 6:31 PM, Niels Ott <nott@sfs.uni-tuebingen.de>wrote:

> Hi all,
>
> I want to safely delete documents from my index. There is an URL field that
> specifies where the document came from.
>
> I'm using something like this:
>
>   indexwriter.deleteDocuments(new Term("URL", myURL));
>
> (inspired by the Lucene in Action Book, page 35.)
>
> I'm uncertain whether this is safe or not: is there a chance that I delete
> documents I would want to keep? How does the matching exactly work.
>
> During indexing, I'm using a KeywordAnalyzer for the URL field in order to
> avoid tokenization.
>
> Best,
>
>   Niels
>
> --
> Niels Ott
> Computational Linguist (B.A.)
> http://www.drni.de/niels/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message