lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Exact Phrase Query
Date Sun, 02 Nov 2008 00:06:27 GMT
ahhhh, finally. I'm almost completely sure you can't *write* to a
RAMDirectory
and expect the underlying FSDir to be updated. The intent of RAMDirectorys
is to *read* in an index from disk and keep it in memory. Essentially I
believe
that your RAMDirecotry constructor is taking a snapshot of the underlying
disk index, modifying that in-memory copy, and throwing it away without
ever writing it to disk. I wouldn't expect opening the FSDirectory after
writing
to the RAMDirectory to find anything. Ever.

If you really need the RAMDir, I suspect you'll have to open an FS-based
writer as well as a RAM-based writer, and write to both when necessary.
You'll probably also have to open/search your RAM-based index as the
faster alternative to re-opening the FS-based index. Either way, reopening
the index is probably expensive, are you sure you need to? Is there a way
to keep your information in an internal data structure for some period of
time?

Best
Erick



On Sat, Nov 1, 2008 at 6:31 PM, semelak ss <semelak_14@yahoo.com> wrote:

> I am not entirely sure if this can be the cause, but here is something I
> thought might be related:
> The idea is have an index containing documents where each document has a
> combination of two words : word1 and word2 and a score for these two words.
> The index would be searched first if the two words exist, and if not the
> score would be computed on the fly and then added to the index. This process
> would be repeated thousands of times for thousands of words.
>
> Hence, I have an indexwriter and a searcher
> --------------------
> RAMDirectory ramDir = new RAMDirectory(INDEX_DIR);
> IndexWriter  ramWriter = new IndexWriter(ramDir, new WhitespaceAnalyzer(),
> true,IndexWriter.MaxFieldLength.UNLIMITED);
> writer = new IndexWriter(INDEX_DIR,new WhitespaceAnalyzer(),true
> ,IndexWriter.MaxFieldLength.UNLIMITED);
>
> FSDirectory fsdir = FSDirectory.getDirectory(INDEX_DIR);
> IndexReader ir = IndexReader.open(fsdir);
> _searcher = new IndexSearcher(ir);
> --------------------
>
> The indexWriter is closed near the end of the program (it's open while
> searching for words combinations ).
>
> When using Luke,, I was able to search successfully for exact phrases. My
> guess is that the problem I am facing has something to do with the
> indexWriter, but I can not pinpoint the exact cause of the problem.
>
>
> --- On Sat, 11/1/08, semelak ss <semelak_14@yahoo.com> wrote:
>
> > From: semelak ss <semelak_14@yahoo.com>
> > Subject: Re: Exact Phrase Query
> > To: java-user@lucene.apache.org
> > Date: Saturday, November 1, 2008, 10:03 AM
> > When using Luke,, searching for the followings gives me hits
> > now:
> > "insurer storm"
> > The synatx of the query as parsed by Luke is :
> > word:"insurer storm"
> >
> > The code I am using is as follows:
> > ----------------------
> > _searcher = new IndexSearcher(INDEX_DIR);
> > _parser = new QueryParser("word", new
> > WhitespaceAnalyzer());
> > Query q = _parser.parse(query);
> > System.out.println(q.toString()); // this outputs ->
> > word:"insurer storm"
> > TopDocs vv= _searcher.search(q, 1);
> > Hits tmph = _searcher.search(q);
> > ---------------------------------
> >
> > both vv and tmph give no results (their size is 0)
> >
> >
> >
> > --- On Fri, 10/31/08, semelak ss
> > <semelak_14@yahoo.com> wrote:
> >
> > > From: semelak ss <semelak_14@yahoo.com>
> > > Subject: Re: Exact Phrase Query
> > > To: java-user@lucene.apache.org
> > > Date: Friday, October 31, 2008, 9:41 AM
> > > For indexing, I use the following:
> > > ===========
> > > writer = new IndexWriter(INDEX_DIR,new
> > > WhitespaceAnalyzer(),true
> > > ,IndexWriter.MaxFieldLength.UNLIMITED);
> > > Document doc = new Document();
> > > String tmpword = this.getProperForm(word1, word2);
> > > doc.add(new Field("WORDS", tmpword,
> > > Field.Store.YES, Field.Index.TOKENIZED));
> > > doc.add(new Field("score",
> > Double.toString(score)
> > > , Field.Store.YES, Field.Index.NO));
> > > writer.addDocument(adoc);
> > > ============
> > >
> > > For searching,, I use the following (query =
> > > "homeowner work" gives no hits ,,
> > > "homeowner" gives results):
> > > ============
> > > _searcher = new IndexSearcher(INDEX_DIR);
> > > _parser = new QueryParser("WORDS", new
> > > WhitespaceAnalyzer());
> > > q = _parser.parse(query);
> > > Hits tmph = _searcher.search(q);
> > >
> > > ============
> > >
> > > A sample document (contained in the index) is the
> > > following:
> > >
> > > filed: value
> > > -----: -----
> > > WORDS:"homeowners work"
> > > score: 0.1515417
> > >
> > >
> > > Also, please note that I tried using Luke to browse
> > the
> > > index and the fields seem to be filled out with words
> > just
> > > as expected. Searching, however, with exact phrases
> > yield no
> > > answer. Searching with single words gives hits.
> > >
> > >
> > >
> > >
> > > --- On Fri, 10/31/08, Erick Erickson
> > > <erickerickson@gmail.com> wrote:
> > >
> > > > From: Erick Erickson
> > <erickerickson@gmail.com>
> > > > Subject: Re: Exact Phrase Query
> > > > To: java-user@lucene.apache.org,
> > semelak_14@yahoo.com
> > > > Date: Friday, October 31, 2008, 5:57 AM
> > > > You need to give us more information for
> > meaningful
> > > replies,
> > > > like
> > > > the analyzers you use when indexing and
> > searching, the
> > > > exact
> > > > query you use, perhaps the snippets of the code,
> > etc.
> > > >
> > > > That said, things to check:
> > > > Get a copy of Luke and examine your index. You
> > can
> > > even
> > > > run queries through that tool and see what gets
> > sent
> > > to the
> > > > database and what responses you get with those
> > > analyzers.
> > > >
> > > > Make sure you're analyzers at query and index
> > time
> > > are
> > > > doing
> > > > what you expect. Query.toString() is your friend.
> > If
> > > you
> > > > don't
> > > > take the time to understand analyzers, you'll
> > > spend
> > > > lots of time
> > > > spinning your wheels.
> > > >
> > > > And you really should wait more than 9 minutes
> > before
> > > > pinging
> > > > the list....
> > > >
> > > > Best
> > > > Erick
> > > >
> > > >
> > > >
> > > > On Fri, Oct 31, 2008 at 8:44 AM, semelak ss
> > > > <semelak_14@yahoo.com> wrote:
> > > >
> > > > > I have documents containing multiple words
> > in the
> > > the
> > > > field "word"
> > > > > for example, one of the documents contain in
> > the
> > > field
> > > > "word" the
> > > > > following:
> > > > > homeowners work
> > > > >
> > > > > When searching for single words (i.e.
> > homewoners
> > > ) I
> > > > get hits.
> > > > >
> > > > > However, searching for the exact phrase
> > > > "homeowners work" gives me no
> > > > > hits!! I use the double quotes when
> > searching for
> > > > exact phrases.
> > > > >
> > > > > Any idea why ??
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> > ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail:
> > > > java-user-unsubscribe@lucene.apache.org
> > > > > For additional commands, e-mail:
> > > > java-user-help@lucene.apache.org
> > > > >
> > > > >
> > >
> > >
> > >
> > >
> > >
> > >
> > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail:
> > > java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail:
> > > java-user-help@lucene.apache.org
> >
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> > java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail:
> > java-user-help@lucene.apache.org
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message