lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hasan Diwan <hasan.di...@gmail.com>
Subject Re: How to extract 15/20 words around the matched query after getting results from lucene searcher?
Date Mon, 25 May 2009 23:19:56 GMT
2009/5/24 KK <dioxide.software@gmail.com>:
> There is one more mail I found in the archive[3/4 days old] where someone
> asked about extracting 3 neighbors words around the match. I think once you
> have the position of matching term/phrase then extracting 3 or 30 neighbors
> wont be different, right? because you just have to move back/forward and get
> the words, this sounds logically simple but I dont know how simple is this
> implementation-wise.
> Also people are talking about someting called spanQueries/termvectors etc to
> use for this purpose. I'm still to get the exact idea of how to do this.
> As per your mail, you used Java to extract the neighbors, Is that using the
> standard techniques i.e using those spanqueries/termvectors or something
> else.
// query contains the query string, and doc contains the string
corresponding to the document contents
public String resultPhrase() {
   int queryPosition = doc.indexOf(query);
   int numberOfWords = 20; // get 20 words on either side of query
   String [] words = doc.split("\s");
   ArrayList wordList = new ArrayList();
   String[] queryWords = query.split("\s");
   String ret = new String();
   for (int i = wordList.indexOf(queryWords[0])-numberOfWords; i!=
wordList.indexOf(queryWords[0]+numberOfWords;i++) {
      if (words[i] == null) continue;
      ret += words[i];
   }
   return ret;
}
This isn't tested, but let me know how it works, this doesn't use
anything beyond what is in the JDK.
-- 
Sent from my mobile device

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message