lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Problem querying Lucene after escaping
Date Mon, 25 Jun 2012 13:58:08 GMT
TermQuerys are assumed to be parsed already. So you're
looking for a _single_ term "ncbi-geneid:379474 or XI.24622".


You'd construct something like
Query query1 = new TermQuery(new Term("type", "gene"));
Query query2 = new TermQuery(new Term("alt_Id", "ncbi-geneid:379474"));
Query query3 = new TermQuery(new Term("alt_Id", "unigene:XI.24622"));

BooleanQuery query = new BooleanQuery();
query.add(query1, BooleanClause.Occur.MUST);

BooleanQuery queryB = new BooleanQuery();
queryB.add(query2, ...SHOULD);
queryB.add(query3, ...SHOULD);

query.add(queryB, BooleanClause.Occur.MUST);


But this _assumes_ that you have _single tokens_ of the
form ncbi-geneid:379474 but given that you say that just the
bare 379474 works, I'm guessing as Ian says that you don't
have what you think you do in your index, you probably have
individual tokens like "ncbi-geneid" (or "ncbi" and "geneid" even),
BC054227, xia, etc. You need to look into your index with Luke
and see what's actually in there.

You might think about installing Solr, _not_ to power your app, but just
to play with the admin/analysis page to understand how
Analysis works with various combinations of tokenizers and filters....

Best
Erick

On Mon, Jun 25, 2012 at 8:50 AM,  <secevalliv@gmail.com> wrote:
> I'm quite new to Lucene and recently, I ran into a problem. I have a lucene
> document that looks like this:
>
> --- type ---
> gene
>
> --- id ---
> xla:379474
>
> --- alt_id ---
> emb:BC054227
> gb:BC054227
> ncbi-geneid:379474
> ncbi-gi:148230166
> rs:NM_001086315
> rs:NP_001079784
> unigene:Xl.24622
> xla:379474
>
>
> I created the query bellow in order to retrieve that document. It works
> fine for altId = 379474 but not for altId = ncbi-geneid:379474 or Xl.24622.
> I guessed altId must be escaped and tried String altId =
> QueryParser.escape(altId) with no luck. What I'm I missing?
>
> Query query1 = new TermQuery(new Term("type", "gene"));
> Query query2 = new TermQuery(new Term("alt_Id", altId));
>
> BooleanQuery query = new BooleanQuery();
> query.add(query1, BooleanClause.Occur.MUST);
> query.add(query2, BooleanClause.Occur.MUST);
>
> By the way I'm running lucene v3.0.
>
> Cheers,
> José M. Villaveces

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message