lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <ian....@gmail.com>
Subject Re: Problem querying Lucene after escaping
Date Mon, 25 Jun 2012 15:27:52 GMT
The key thing is to be consistent.  You can either replace your
TermQuery code with the output from QueryParser.parse, with QP created
with StandardAnalyzer, or index alt_id as Index.NOT_ANALYZED and stick
with TermQuery.  I think the latter will work even with multiple
terms/tokens stored for alt_id.  I'd try that first.



--
Ian.


On Mon, Jun 25, 2012 at 3:51 PM,  <secevalliv@gmail.com> wrote:
> Hi All,
>
> Thanks for the quick reply.
>
> It seems like indeed my index is not what I think it is so maybe
> I'm using the wrong analyzer. Here is the code I use to index the multiple
> values of alt_id:
>
> indexWriter = new IndexWriter(FSDirectory.open(new
> File(path)),newStandardAnalyzer(Version.
> LUCENE_30), true, IndexWriter.MaxFieldLength.UNLIMITED);
>
> for(String geneId : genes.keySet()){
>
> //Update Gene
>
> Document gene = new Document();
>
>  gene.add(new Field("type", "gene", Field.Store.YES, Field.Index.ANALYZED
> ));
>
>  gene.add(new Field("id", geneId, Field.Store.YES, Field.Index.ANALYZED));
>
>     for(String altId : genes.get(geneId)){
>
>   gene.add(new Field("alt_id", altId, Field.Store.YES, Field.Index.ANALYZED
> ));
>
>  }
>
>     indexWriter.updateDocument(new Term("id", geneId), gene);
>
>  }
> I understand that the best approach is to have the values for alt_id as
> single tokens, isn't that what the current analyzer does?  wich one should
> I use instead?
>
> Cheers,
>
> José M. Villaveces
>
>
> On 25 June 2012 15:58, Erick Erickson <erickerickson@gmail.com> wrote:
>
>> TermQuerys are assumed to be parsed already. So you're
>> looking for a _single_ term "ncbi-geneid:379474 or XI.24622".
>>
>>
>> You'd construct something like
>> Query query1 = new TermQuery(new Term("type", "gene"));
>> Query query2 = new TermQuery(new Term("alt_Id", "ncbi-geneid:379474"));
>> Query query3 = new TermQuery(new Term("alt_Id", "unigene:XI.24622"));
>>
>> BooleanQuery query = new BooleanQuery();
>> query.add(query1, BooleanClause.Occur.MUST);
>>
>> BooleanQuery queryB = new BooleanQuery();
>> queryB.add(query2, ...SHOULD);
>> queryB.add(query3, ...SHOULD);
>>
>> query.add(queryB, BooleanClause.Occur.MUST);
>>
>>
>> But this _assumes_ that you have _single tokens_ of the
>> form ncbi-geneid:379474 but given that you say that just the
>> bare 379474 works, I'm guessing as Ian says that you don't
>> have what you think you do in your index, you probably have
>> individual tokens like "ncbi-geneid" (or "ncbi" and "geneid" even),
>> BC054227, xia, etc. You need to look into your index with Luke
>> and see what's actually in there.
>>
>> You might think about installing Solr, _not_ to power your app, but just
>> to play with the admin/analysis page to understand how
>> Analysis works with various combinations of tokenizers and filters....
>>
>> Best
>> Erick
>>
>> On Mon, Jun 25, 2012 at 8:50 AM,  <secevalliv@gmail.com> wrote:
>> > I'm quite new to Lucene and recently, I ran into a problem. I have a
>> lucene
>> > document that looks like this:
>> >
>> > --- type ---
>> > gene
>> >
>> > --- id ---
>> > xla:379474
>> >
>> > --- alt_id ---
>> > emb:BC054227
>> > gb:BC054227
>> > ncbi-geneid:379474
>> > ncbi-gi:148230166
>> > rs:NM_001086315
>> > rs:NP_001079784
>> > unigene:Xl.24622
>> > xla:379474
>> >
>> >
>> > I created the query bellow in order to retrieve that document. It works
>> > fine for altId = 379474 but not for altId = ncbi-geneid:379474 or
>> Xl.24622.
>> > I guessed altId must be escaped and tried String altId =
>> > QueryParser.escape(altId) with no luck. What I'm I missing?
>> >
>> > Query query1 = new TermQuery(new Term("type", "gene"));
>> > Query query2 = new TermQuery(new Term("alt_Id", altId));
>> >
>> > BooleanQuery query = new BooleanQuery();
>> > query.add(query1, BooleanClause.Occur.MUST);
>> > query.add(query2, BooleanClause.Occur.MUST);
>> >
>> > By the way I'm running lucene v3.0.
>> >
>> > Cheers,
>> > José M. Villaveces
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message