lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <ser...@gmail.com>
Subject Re: Removing terms in the Index
Date Thu, 15 Apr 2010 06:36:39 GMT
I'm still not sure I understand ...

If the first document includes "Lucene in Action. Lucene" (two sentences,
the 2nd one with Lucene only) and the second "Lucene for Dummies", then what
exactly do you want to get for the queries "\"Lucene in Action\"" and
"\"Lucene\""?

If I understand correctly, then you want just the first doc as a match for
these two queries? Which means the second query "\"Lucene\"" should not
match the second document, because Lucene appears as one of the words in a
sentence?

If that's what you want, then you need to create a Tokenizer which
recognizes end-of-sentence boundaries, and indexes each sentence as a single
token. So the tokens in the index will be:
Lucene in Action -> doc1
Lucene -> doc1
Lucene for Dummies -> doc2

Is that what you want? If not then I apologize, but I still don't understand
...

Shai

On Wed, Apr 14, 2010 at 7:10 PM, Railan Xisto <railan.xisto@gmail.com>wrote:

> Actually the doc1 with the terms to be searched, has two words "Lucene in
> Action" and "Lucene". I want when I pass "Lucene in Action", it shows the
> result and remove the word not to be found when I pass only the term
> "Lucene". In short, the term "Lucene" not find the phrase "Lucene in
> Action", since the entire phrase was searched before. It is the idea of
> N-Gram (complete sentences) and U-Gram (isolated words). Gave to
> understand?
>
>
> 2010/4/13 Shai Erera <serera@gmail.com>
>
> > I ran your code. Since I don't have the queries file
> (Docs/documento.txt),
> > I
> > set this line instead:
> >
> > String termos = "\"Lucene in Action\"";
> >
> > When I set it to \"Lucene\", both documents are found. When I set it to
> > \"Lucene in Action\" only the first document is found. Seems correct to
> me.
> >
> > Can you please explain this:
> > "I pass the word "Lucene in Action", it find and
> > remove that term of phrase in the Index"
> >
> > --> what do you mean "find and remove"?
> >
> > Shai
> >
> > On Mon, Apr 12, 2010 at 8:49 PM, Railan Xisto <railan.xisto@gmail.com
> > >wrote:
> >
> > > And the main objective: when I pass the word "Lucene in Action", it
> find
> > > and
> > > remove that term of phrase in the Index, for when I pass the 2nd term
> > > ("Lucene"), he does not find that phrase anymore, as has been found the
> > > "Lucene in Action" .
> > >
> > >
> > > 2010/4/12 Railan Xisto <railan.xisto@gmail.com>
> > >
> > > > Ok. There is a piece of code attached.. As I already said, I want to
> > pass
> > > > that when the term "Lucene in Action" he finds only the 1st sentence.
> > > >
> > > >
> > > >
> > > >
> > > > 2010/4/10 Shai Erera <serera@gmail.com>
> > > >
> > > > Hi. I'm not sure I understand what you searched for. When you search
> > > >> for "Lucene in action", do you search it with the quotes or not? If
> > > >> with the quotes, then I don't understand how the 2nd dox is found.
> > > >>
> > > >> Do you perhaps have a test code you can share w/ us? It can be a
> short
> > > >> and simple main which creates an index w/ some documents and then
> > > >> searches them.
> > > >>
> > > >> Shai
> > > >>
> > > >> On Saturday, April 10, 2010, Fotos fotos <railan.xisto@gmail.com>
> > > wrote:
> > > >> > Hello!
> > > >> > I am a beginner with Lucene. I'm needing to do the following:
> > > >> >
> > > >> > I have a text file with the following terms:
> > > >> >
> > > >> > "Lucene in action"
> > > >> > "Lucene"
> > > >> >
> > > >> > and a file with the following sentences:
> > > >> >
> > > >> > 1 - "Lucene in action now."
> > > >> > 2 - "Lucene for Dummies"
> > > >> > 3 - "Managing Gigabytes"
> > > >> >
> > > >> > I need to search in phrases of doc2, the terms of doc1.
> > > >> >
> > > >> > But in search of the word n-grama: "Lucene in Action", he also
> finds
> > > the
> > > >> 2nd
> > > >> > sentence.
> > > >> >
> > > >> > In this case, I want to meet with the term 1 ("Lucene in Action"),
> > > only
> > > >> the
> > > >> > first phrase and remove the term of the index, for not to be
found
> > > when
> > > >> I
> > > >> > pass the term 2 ("Lucene")
> > > >> >
> > > >> > Railan Xisto
> > > >> > Web Developer
> > > >> >
> > > >>
> > > >>
>  ---------------------------------------------------------------------
> > > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > >> For additional commands, e-mail: java-user-help@lucene.apache.org
> > > >>
> > > >>
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message