lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From seceval...@gmail.com
Subject Re: Problem querying Lucene after escaping
Date Tue, 26 Jun 2012 11:01:38 GMT
Thanks for the advice Ian. As you suggested I tried indexing alt_id as
Index.NOT_ANALYZED and stick
with TermQuery. It works now.

Thanks again,

José M. Villaveces


On 25 June 2012 17:27, Ian Lea <ian.lea@gmail.com> wrote:

> The key thing is to be consistent.  You can either replace your
> TermQuery code with the output from QueryParser.parse, with QP created
> with StandardAnalyzer, or index alt_id as Index.NOT_ANALYZED and stick
> with TermQuery.  I think the latter will work even with multiple
> terms/tokens stored for alt_id.  I'd try that first.
>
>
>
> --
> Ian.
>
>
> On Mon, Jun 25, 2012 at 3:51 PM,  <secevalliv@gmail.com> wrote:
> > Hi All,
> >
> > Thanks for the quick reply.
> >
> > It seems like indeed my index is not what I think it is so maybe
> > I'm using the wrong analyzer. Here is the code I use to index the
> multiple
> > values of alt_id:
> >
> > indexWriter = new IndexWriter(FSDirectory.open(new
> > File(path)),newStandardAnalyzer(Version.
> > LUCENE_30), true, IndexWriter.MaxFieldLength.UNLIMITED);
> >
> > for(String geneId : genes.keySet()){
> >
> > //Update Gene
> >
> > Document gene = new Document();
> >
> >  gene.add(new Field("type", "gene", Field.Store.YES, Field.Index.ANALYZED
> > ));
> >
> >  gene.add(new Field("id", geneId, Field.Store.YES,
> Field.Index.ANALYZED));
> >
> >     for(String altId : genes.get(geneId)){
> >
> >   gene.add(new Field("alt_id", altId, Field.Store.YES,
> Field.Index.ANALYZED
> > ));
> >
> >  }
> >
> >     indexWriter.updateDocument(new Term("id", geneId), gene);
> >
> >  }
> > I understand that the best approach is to have the values for alt_id as
> > single tokens, isn't that what the current analyzer does?  wich one
> should
> > I use instead?
> >
> > Cheers,
> >
> > José M. Villaveces
> >
> >
> > On 25 June 2012 15:58, Erick Erickson <erickerickson@gmail.com> wrote:
> >
> >> TermQuerys are assumed to be parsed already. So you're
> >> looking for a _single_ term "ncbi-geneid:379474 or XI.24622".
> >>
> >>
> >> You'd construct something like
> >> Query query1 = new TermQuery(new Term("type", "gene"));
> >> Query query2 = new TermQuery(new Term("alt_Id", "ncbi-geneid:379474"));
> >> Query query3 = new TermQuery(new Term("alt_Id", "unigene:XI.24622"));
> >>
> >> BooleanQuery query = new BooleanQuery();
> >> query.add(query1, BooleanClause.Occur.MUST);
> >>
> >> BooleanQuery queryB = new BooleanQuery();
> >> queryB.add(query2, ...SHOULD);
> >> queryB.add(query3, ...SHOULD);
> >>
> >> query.add(queryB, BooleanClause.Occur.MUST);
> >>
> >>
> >> But this _assumes_ that you have _single tokens_ of the
> >> form ncbi-geneid:379474 but given that you say that just the
> >> bare 379474 works, I'm guessing as Ian says that you don't
> >> have what you think you do in your index, you probably have
> >> individual tokens like "ncbi-geneid" (or "ncbi" and "geneid" even),
> >> BC054227, xia, etc. You need to look into your index with Luke
> >> and see what's actually in there.
> >>
> >> You might think about installing Solr, _not_ to power your app, but just
> >> to play with the admin/analysis page to understand how
> >> Analysis works with various combinations of tokenizers and filters....
> >>
> >> Best
> >> Erick
> >>
> >> On Mon, Jun 25, 2012 at 8:50 AM,  <secevalliv@gmail.com> wrote:
> >> > I'm quite new to Lucene and recently, I ran into a problem. I have a
> >> lucene
> >> > document that looks like this:
> >> >
> >> > --- type ---
> >> > gene
> >> >
> >> > --- id ---
> >> > xla:379474
> >> >
> >> > --- alt_id ---
> >> > emb:BC054227
> >> > gb:BC054227
> >> > ncbi-geneid:379474
> >> > ncbi-gi:148230166
> >> > rs:NM_001086315
> >> > rs:NP_001079784
> >> > unigene:Xl.24622
> >> > xla:379474
> >> >
> >> >
> >> > I created the query bellow in order to retrieve that document. It
> works
> >> > fine for altId = 379474 but not for altId = ncbi-geneid:379474 or
> >> Xl.24622.
> >> > I guessed altId must be escaped and tried String altId =
> >> > QueryParser.escape(altId) with no luck. What I'm I missing?
> >> >
> >> > Query query1 = new TermQuery(new Term("type", "gene"));
> >> > Query query2 = new TermQuery(new Term("alt_Id", altId));
> >> >
> >> > BooleanQuery query = new BooleanQuery();
> >> > query.add(query1, BooleanClause.Occur.MUST);
> >> > query.add(query2, BooleanClause.Occur.MUST);
> >> >
> >> > By the way I'm running lucene v3.0.
> >> >
> >> > Cheers,
> >> > José M. Villaveces
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message