lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wenbo Zhao <zha...@gmail.com>
Subject Re: how to match a term within digital strings?
Date Mon, 09 Nov 2009 06:14:03 GMT
Hi all,
I think I got an approach, it may not be the best but it works.
My code is as following, work as query of "*19810919*"
    IndexSearcher isearcher = new IndexSearcher(directory, true);
    IndexReader ir = isearcher.getIndexReader();
    TermEnum te = ir.terms();
    List<String> result = new ArrayList<String>();
    while(te.next()){
        Term t = te.term();
        String text = t.text();
        if(text.indexOf("19810919") >= 0) result.add(text);
    }
    QueryParser parser = new QueryParser(field, analyzer);
    StringBuilder sb = new StringBuilder();
    for(String s : result) sb.append(s).append(' ');
    Query query = parser.parse(sb.toString());
    ScoreDoc[] hits = isearcher.search(query, null, 1000).scoreDocs;
    System.out.println(query+"="+hits.length);
    for (int i = 0; i < hits.length; i++) { // Iterate through the results:
      Document hitDoc = isearcher.doc(hits[i].doc);
      System.out.println("ID="+hitDoc.get("id"));
    }
My index is about 30M, contains 13k+ docs and 1978k+ terms, all digit strings.
the loop enumerate term used 2.33 seconds.
But my goal is 3000 times of this index size, I'm afraid enumerate
term will cause too much time.

Anybody has better idea ?



2009/11/9 Wenbo Zhao <wbzhao@travelsky.com>:
> Hi all,
> I want to query part of a digital string:
> say indexed token is "123456789"
> I want to query 56789 to match this token
> The "Query Parser Syntax" says wildcard search can not
> be the first char.  So "*56789" is not allowed
> How can I do that ?
> Thanks.
>
> --
>
> Best Regards,
> ZHAO, Wenbo
>
> =======================
>



-- 

Best Regards,
ZHAO, Wenbo

=======================

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message