lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From semelak ss <semelak...@yahoo.com>
Subject Re: Exact Phrase Query
Date Sun, 02 Nov 2008 15:26:26 GMT
I was in a hurry when copying and pasting the code. What I've been using is only writer. RamWriter
was never used as it never really worked (thanks to you, I now understand the reason).

The above is not really related to the problem I was facing. I modified my code so that an
indexreader/indexwriter is opened right before the words comparison takes place and is closed
right after. (currently not using RamDir due to the problems faced earlier)

Considering that the program is basically a loop that does thousands and thousands of comparison,
this is definitely not the most efficient way of handling things.

I would appreciate any input in this regard on how to improve the efficiency. 



--- On Sat, 11/1/08, Erick Erickson <erickerickson@gmail.com> wrote:

> From: Erick Erickson <erickerickson@gmail.com>
> Subject: Re: Exact Phrase Query
> To: java-user@lucene.apache.org, semelak_14@yahoo.com
> Date: Saturday, November 1, 2008, 5:06 PM
> ahhhh, finally. I'm almost completely sure you can't
> *write* to a
> RAMDirectory
> and expect the underlying FSDir to be updated. The intent
> of RAMDirectorys
> is to *read* in an index from disk and keep it in memory.
> Essentially I
> believe
> that your RAMDirecotry constructor is taking a snapshot of
> the underlying
> disk index, modifying that in-memory copy, and throwing it
> away without
> ever writing it to disk. I wouldn't expect opening the
> FSDirectory after
> writing
> to the RAMDirectory to find anything. Ever.
> 
> If you really need the RAMDir, I suspect you'll have to
> open an FS-based
> writer as well as a RAM-based writer, and write to both
> when necessary.
> You'll probably also have to open/search your RAM-based
> index as the
> faster alternative to re-opening the FS-based index. Either
> way, reopening
> the index is probably expensive, are you sure you need to?
> Is there a way
> to keep your information in an internal data structure for
> some period of
> time?
> 
> Best
> Erick
> 
> 
> 
> On Sat, Nov 1, 2008 at 6:31 PM, semelak ss
> <semelak_14@yahoo.com> wrote:
> 
> > I am not entirely sure if this can be the cause, but
> here is something I
> > thought might be related:
> > The idea is have an index containing documents where
> each document has a
> > combination of two words : word1 and word2 and a score
> for these two words.
> > The index would be searched first if the two words
> exist, and if not the
> > score would be computed on the fly and then added to
> the index. This process
> > would be repeated thousands of times for thousands of
> words.
> >
> > Hence, I have an indexwriter and a searcher
> > --------------------
> > RAMDirectory ramDir = new RAMDirectory(INDEX_DIR);
> > IndexWriter  ramWriter = new IndexWriter(ramDir, new
> WhitespaceAnalyzer(),
> > true,IndexWriter.MaxFieldLength.UNLIMITED);
> > writer = new IndexWriter(INDEX_DIR,new
> WhitespaceAnalyzer(),true
> > ,IndexWriter.MaxFieldLength.UNLIMITED);
> >
> > FSDirectory fsdir =
> FSDirectory.getDirectory(INDEX_DIR);
> > IndexReader ir = IndexReader.open(fsdir);
> > _searcher = new IndexSearcher(ir);
> > --------------------
> >
> > The indexWriter is closed near the end of the program
> (it's open while
> > searching for words combinations ).
> >
> > When using Luke,, I was able to search successfully
> for exact phrases. My
> > guess is that the problem I am facing has something to
> do with the
> > indexWriter, but I can not pinpoint the exact cause of
> the problem.
> >
> >
> > --- On Sat, 11/1/08, semelak ss
> <semelak_14@yahoo.com> wrote:
> >
> > > From: semelak ss <semelak_14@yahoo.com>
> > > Subject: Re: Exact Phrase Query
> > > To: java-user@lucene.apache.org
> > > Date: Saturday, November 1, 2008, 10:03 AM
> > > When using Luke,, searching for the followings
> gives me hits
> > > now:
> > > "insurer storm"
> > > The synatx of the query as parsed by Luke is :
> > > word:"insurer storm"
> > >
> > > The code I am using is as follows:
> > > ----------------------
> > > _searcher = new IndexSearcher(INDEX_DIR);
> > > _parser = new QueryParser("word", new
> > > WhitespaceAnalyzer());
> > > Query q = _parser.parse(query);
> > > System.out.println(q.toString()); // this outputs
> ->
> > > word:"insurer storm"
> > > TopDocs vv= _searcher.search(q, 1);
> > > Hits tmph = _searcher.search(q);
> > > ---------------------------------
> > >
> > > both vv and tmph give no results (their size is
> 0)
> > >
> > >
> > >
> > > --- On Fri, 10/31/08, semelak ss
> > > <semelak_14@yahoo.com> wrote:
> > >
> > > > From: semelak ss
> <semelak_14@yahoo.com>
> > > > Subject: Re: Exact Phrase Query
> > > > To: java-user@lucene.apache.org
> > > > Date: Friday, October 31, 2008, 9:41 AM
> > > > For indexing, I use the following:
> > > > ===========
> > > > writer = new IndexWriter(INDEX_DIR,new
> > > > WhitespaceAnalyzer(),true
> > > > ,IndexWriter.MaxFieldLength.UNLIMITED);
> > > > Document doc = new Document();
> > > > String tmpword = this.getProperForm(word1,
> word2);
> > > > doc.add(new Field("WORDS",
> tmpword,
> > > > Field.Store.YES, Field.Index.TOKENIZED));
> > > > doc.add(new Field("score",
> > > Double.toString(score)
> > > > , Field.Store.YES, Field.Index.NO));
> > > > writer.addDocument(adoc);
> > > > ============
> > > >
> > > > For searching,, I use the following (query =
> > > > "homeowner work" gives no hits ,,
> > > > "homeowner" gives results):
> > > > ============
> > > > _searcher = new IndexSearcher(INDEX_DIR);
> > > > _parser = new QueryParser("WORDS",
> new
> > > > WhitespaceAnalyzer());
> > > > q = _parser.parse(query);
> > > > Hits tmph = _searcher.search(q);
> > > >
> > > > ============
> > > >
> > > > A sample document (contained in the index)
> is the
> > > > following:
> > > >
> > > > filed: value
> > > > -----: -----
> > > > WORDS:"homeowners work"
> > > > score: 0.1515417
> > > >
> > > >
> > > > Also, please note that I tried using Luke to
> browse
> > > the
> > > > index and the fields seem to be filled out
> with words
> > > just
> > > > as expected. Searching, however, with exact
> phrases
> > > yield no
> > > > answer. Searching with single words gives
> hits.
> > > >
> > > >
> > > >
> > > >
> > > > --- On Fri, 10/31/08, Erick Erickson
> > > > <erickerickson@gmail.com> wrote:
> > > >
> > > > > From: Erick Erickson
> > > <erickerickson@gmail.com>
> > > > > Subject: Re: Exact Phrase Query
> > > > > To: java-user@lucene.apache.org,
> > > semelak_14@yahoo.com
> > > > > Date: Friday, October 31, 2008, 5:57 AM
> > > > > You need to give us more information
> for
> > > meaningful
> > > > replies,
> > > > > like
> > > > > the analyzers you use when indexing and
> > > searching, the
> > > > > exact
> > > > > query you use, perhaps the snippets of
> the code,
> > > etc.
> > > > >
> > > > > That said, things to check:
> > > > > Get a copy of Luke and examine your
> index. You
> > > can
> > > > even
> > > > > run queries through that tool and see
> what gets
> > > sent
> > > > to the
> > > > > database and what responses you get
> with those
> > > > analyzers.
> > > > >
> > > > > Make sure you're analyzers at query
> and index
> > > time
> > > > are
> > > > > doing
> > > > > what you expect. Query.toString() is
> your friend.
> > > If
> > > > you
> > > > > don't
> > > > > take the time to understand analyzers,
> you'll
> > > > spend
> > > > > lots of time
> > > > > spinning your wheels.
> > > > >
> > > > > And you really should wait more than 9
> minutes
> > > before
> > > > > pinging
> > > > > the list....
> > > > >
> > > > > Best
> > > > > Erick
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Oct 31, 2008 at 8:44 AM,
> semelak ss
> > > > > <semelak_14@yahoo.com> wrote:
> > > > >
> > > > > > I have documents containing
> multiple words
> > > in the
> > > > the
> > > > > field "word"
> > > > > > for example, one of the documents
> contain in
> > > the
> > > > field
> > > > > "word" the
> > > > > > following:
> > > > > > homeowners work
> > > > > >
> > > > > > When searching for single words
> (i.e.
> > > homewoners
> > > > ) I
> > > > > get hits.
> > > > > >
> > > > > > However, searching for the exact
> phrase
> > > > > "homeowners work" gives me no
> > > > > > hits!! I use the double quotes
> when
> > > searching for
> > > > > exact phrases.
> > > > > >
> > > > > > Any idea why ??
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> ---------------------------------------------------------------------
> > > > > > To unsubscribe, e-mail:
> > > > > java-user-unsubscribe@lucene.apache.org
> > > > > > For additional commands, e-mail:
> > > > > java-user-help@lucene.apache.org
> > > > > >
> > > > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail:
> > > > java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail:
> > > > java-user-help@lucene.apache.org
> > >
> > >
> > >
> > >
> > >
> > >
> ---------------------------------------------------------------------
> > > To unsubscribe, e-mail:
> > > java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail:
> > > java-user-help@lucene.apache.org
> >
> >
> >
> >
> >
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail:
> java-user-help@lucene.apache.org
> >
> >


      


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message