lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bhaskar <bhaskar1...@gmail.com>
Subject Re: Need help in alphanumeric search
Date Thu, 01 Oct 2015 12:52:39 GMT
Hi Uwe,
my searching is working like this.
if i give input as "SD RAM Bhaskar" then which ever strings are having
"SD", "RAM", "Bhaskar" all results are coming .
i.e. "SD lib"
      "RAM hello"
      "hi Bhaskar "
      "Bhaskar hai SD"


But I want below output.
       "SD RAM Bhaskar"
       "SD RAM Bhaskar hello"
i.e in the begining string have "SD RAM Bhaskar"  then next string can be
any thing.


but my current application result where ever it finds the "SD", or "RAM",
or "Bhaskar" I am getting all the string.


Can you please advice?
Thanks a lot in advance.

Regards,
Bhaskar




On Wed, Sep 30, 2015 at 12:23 PM, Uwe Schindler <uwe@thetaphi.de> wrote:

> Hi Bhaskar,
>
> the answer is very simple: Your analysis is not useful for the type of
> queries and data you are using. You are using SimpleAnalyzer in your
> search/indexing code:
>
>
> https://lucene.apache.org/core/5_3_1/analyzers-common/org/apache/lucene/analysis/core/SimpleAnalyzer.html
> "An Analyzer that filters LetterTokenizer with LowerCaseFilter"
>
> And LetterTokenizer does the following:
>
> https://lucene.apache.org/core/5_3_1/analyzers-common/org/apache/lucene/analysis/core/LetterTokenizer.html
> "A LetterTokenizer is a tokenizer that divides text at non-letters. That's
> to say, it defines tokens as maximal strings of adjacent letters, as
> defined by java.lang.Character.isLetter() predicate."
>
> So it creates a new token at every non-letter boundary. All non-letters
> are discarded (because they are treated as token boundary). So your queries
> can never match.
>
> I'd suggest to first inform yourself about analysis and choose a better
> one that suits your underlying data and the queries you want to do. Maybe
> use WhitespaceAnalyzer or better StandardAnalyzer as a first step. Be sure
> to reindex your data before querying. The Analyzer used on the search side
> must be the same like on the query side. If you want to use wildcards, you
> have to take care more, because wildcards are not really natural for "full
> text search engine" and may cause inconsistent results.
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
> > -----Original Message-----
> > From: Bhaskar [mailto:bhaskar1484@gmail.com]
> > Sent: Wednesday, September 30, 2015 4:28 AM
> > To: java-user@lucene.apache.org
> > Subject: Re: Need help in alphanumeric search
> >
> > Hi Uwe,
> >
> > Below is my indexing code:
> >
> > public static void main(String[] args) throws Exception { //Path
> indexDir =
> > new Path(INDEX_DIR); public static final String INDEX_DIR =
> "c:/DBIndexAll/";
> > final Path indexDir = Paths.get(INDEX_DIR); SimpleDBIndexer indexer = new
> > SimpleDBIndexer(); try{
> >    Class.forName(JDBC_DRIVER).newInstance();
> >    Connection conn = DriverManager.getConnection(CONNECTION_URL +
> > DBNAME, USER_NAME, PASSWORD);
> >    SimpleAnalyzer analyzer = new SimpleAnalyzer();
> >    IndexWriterConfig indexWriterConfig = new IndexWriterConfig(analyzer);
> >    IndexWriter indexWriter = new IndexWriter(FSDirectory.open(indexDir),
> > indexWriterConfig);
> >    System.out.println("Indexing to directory '" + indexDir + "'...");
> >    int indexedDocumentCount = indexer.indexDocs(indexWriter, conn);
> >    indexWriter.close();
> >    System.out.println(indexedDocumentCount + " records have been indexed
> > successfully"); } catch (Exception e) {
> >    e.printStackTrace();
> > }
> > }
> >
> > int indexDocs(IndexWriter writer, Connection conn) throws Exception {
> >   String sql = QUERY1;
> >   Statement stmt = conn.createStatement();
> >   ResultSet rs = stmt.executeQuery(sql);
> >   int i=0;
> >   while (rs.next()) {
> >      Document d = new Document();
> >      d.add(new TextField("cpn", rs.getString("cpn"), Field.Store.YES));
> >
> >      writer.addDocument(d);
> >      i++;
> >  }
> >   stmt.close();
> >   rs.close();
> >
> >   return i;
> > }
> >
> >
> > Searching code:
> >
> > public class SimpleDBSearcher {
> > // PLASTRON
> > private static final String LUCENE_QUERY = "SD*"; private static final
> int
> > MAX_HITS = 500; private static final String INDEX_DIR = "C:/DBIndexAll/";
> >
> > public static void main(String[] args) throws Exception { // File
> indexDir = new
> > File(SimpleDBIndexer.INDEX_DIR); final Path indexDir =
> > Paths.get(SimpleDBIndexer.INDEX_DIR);
> > String query = LUCENE_QUERY;
> > SimpleDBSearcher searcher = new SimpleDBSearcher();
> > searcher.searchIndex(indexDir, query); }
> >
> > private void searchIndex(Path indexDir, String queryStr) throws
> Exception {
> > Directory directory = FSDirectory.open(indexDir); System.out.println("The
> > query string is " + queryStr); MultiFieldQueryParser queryParser = new
> > MultiFieldQueryParser(new String[] { "cpn" }, new StandardAnalyzer());
> > IndexReader reader = DirectoryReader.open(directory); IndexSearcher
> > searcher = new IndexSearcher(reader);
> > queryParser.getAllowLeadingWildcard();
> >
> > Query query = queryParser.parse(queryStr); TopDocs topDocs =
> > searcher.search(query, MAX_HITS);
> >
> > ScoreDoc[] hits = topDocs.scoreDocs;
> > System.out.println(hits.length + " Record(s) Found"); for (int i = 0; i <
> > hits.length; i++) { int docId = hits[i].doc; Document d =
> searcher.doc(docId);
> > System.out.println("\"cpn value is:\" " + d.get("cpn")); } if
> (hits.length == 0) {
> > System.out.println("No Data Founds "); }
> >
> > }
> > }
> >
> >
> > Please help here, thanks in advance.
> >
> > Regards,
> > Bhaskar
> >
> > On Tue, Sep 29, 2015 at 3:47 AM, Uwe Schindler <uwe@thetaphi.de> wrote:
> >
> > > Hi Erick,
> > >
> > > This mail was in Lucene's user mailing list. This is not about Solr,
> > > so user cannot provide his Solr config! :-) In any case, it would be
> > > good to get the Analyzer + code you use while indexing and also the
> > > code (+ Analyzer) that creates the query while searching.
> > >
> > > Uwe
> > >
> > > -----
> > > Uwe Schindler
> > > H.-H.-Meier-Allee 63, D-28213 Bremen
> > > http://www.thetaphi.de
> > > eMail: uwe@thetaphi.de
> > >
> > >
> > > > -----Original Message-----
> > > > From: Erick Erickson [mailto:erickerickson@gmail.com]
> > > > Sent: Monday, September 28, 2015 6:01 PM
> > > > To: java-user
> > > > Subject: Re: Need help in alphanumeric search
> > > >
> > > > You need to supply the definitions of this field from your
> > > > schema.xml
> > > file,
> > > > both the <field> and <fieldType>
> > > >
> > > > Additionally, please provide the results of the query you're trying
> > > > with &debug=true appended.
> > > >
> > > > The adminUI/analysis page is very helpful in these situations as
> well.
> > > Select
> > > > the appropriate core from the drop-down on the left and you'll see
> > > > an "analysis"
> > > > section appear that shows you exactly what happens when the field is
> > > > analyzed.
> > > >
> > > > Best,
> > > > Erick
> > > >
> > > > On Mon, Sep 28, 2015 at 5:01 AM, Bhaskar <bhaskar1484@gmail.com>
> > wrote:
> > > > > Thanks Lan for reply.
> > > > >
> > > > > cpn values are like 123-0049, 342-043, ab23-090, hedwsdg
> > > > >
> > > > > my application is working when i gave search  for below inputs
> > > > > 1) ab*
> > > > >  2)hedwsdg
> > > > > 3) hed*
> > > > >
> > > > > but it is not working for
> > > > > 1) 123*
> > > > > 2) 123-0049
> > > > > 3) ab23*
> > > > >
> > > > >
> > > > > Note: if the search input has number then it is not working.
> > > > >
> > > > > Thanks in advacne.
> > > > >
> > > > >
> > > > > On Mon, Sep 28, 2015 at 3:49 PM, Ian Lea <ian.lea@gmail.com>
> wrote:
> > > > >
> > > > >> Hi
> > > > >>
> > > > >>
> > > > >> Can you provide a few examples of values of cpn that a) are and
> > > > >> b) are not being found, for indexing and searching.
> > > > >>
> > > > >> You may also find some of the tips at
> > > > >>
> > > > >> http://wiki.apache.org/lucene-
> > > > java/LuceneFAQ#Why_am_I_getting_no_hits
> > > > >> _.2F_incorrect_hits.3F
> > > > >> useful.
> > > > >>
> > > > >> You haven't shown the code that created the IndexWriter so the
> > > > >> tip about using the same analyzer at index and search time might
> > > > >> be relevant.
> > > > >>
> > > > >>
> > > > >>
> > > > >> --
> > > > >> Ian.
> > > > >>
> > > > >>
> > > > >> On Mon, Sep 28, 2015 at 10:49 AM, Bhaskar
> > <bhaskar1484@gmail.com>
> > > > wrote:
> > > > >> > Hi,
> > > > >> > I am beginner in Apache lucene, I am using 5.3.1.
> > > > >> > I have created  the index on the database result. The index
> > > > >> > values are having alphanumeric and strings values. I am
able to
> > > > >> > search the strings
> > > > >> but
> > > > >> > I am not able to search alphanumeric values.
> > > > >> >
> > > > >> > Can someone help me here.
> > > > >> >
> > > > >> > Below is indexing code...
> > > > >> >
> > > > >> > int indexDocs(IndexWriter writer, Connection conn) throws
> > > > >> > Exception { Statement stmt = conn.createStatement();
> > > > >> >   ResultSet rs = stmt.executeQuery(sql);
> > > > >> >   int i=0;
> > > > >> >   while (rs.next()) {
> > > > >> >      Document d = new Document();
> > > > >> >     // System.out.println("cpn is" + rs.getString("cpn"));
> > > > >> >     // System.out.println("mpn is" + rs.getString("mpn"));
> > > > >> >
> > > > >> >   d.add(new TextField("cpn", rs.getString("cpn"),
> > > > >> > Field.Store.YES));
> > > > >> >
> > > > >> >
> > > > >> >      writer.addDocument(d);
> > > > >> >      i++;
> > > > >> >  }
> > > > >> > }
> > > > >> >
> > > > >> > Searching code:
> > > > >> >
> > > > >> >
> > > > >> > private void searchIndex(Path indexDir, String queryStr)
throws
> > > > >> Exception {
> > > > >> > Directory directory = FSDirectory.open(indexDir);
> > > > >> > System.out.println("The query string is " + queryStr); //
> > > > >> > MultiFieldQueryParser queryParser = new
> > > > >> > MultiFieldQueryParser(new // String[] {"mpn"}, new
> > > > >> > StandardAnalyzer()); // IndexReader reader =
> > > > >> > IndexReader.open(directory); IndexReader reader =
> > > > >> > DirectoryReader.open(directory); IndexSearcher searcher
= new
> > > > >> > IndexSearcher(reader); Analyzer analyzer = new
> > > > >> > StandardAnalyzer(); analyzer.tokenStream("cpn", queryStr);
> > > > >> > QueryParser parser = new QueryParser("cpn", analyzer);
> > > > >> > parser.setDefaultOperator(Operator.OR);
> > > > >> > parser.getAllowLeadingWildcard();
> > > > >> > parser.setAutoGeneratePhraseQueries(true);
> > > > >> > Query query = parser.parse(queryStr); searcher.search(query,
> > > > >> > 100); TopDocs topDocs = searcher.search(query, MAX_HITS);
> > > > >> >
> > > > >> > ScoreDoc[] hits = topDocs.scoreDocs;
> > > > >> > System.out.println(hits.length
> > > > >> > + " Record(s) Found"); for (int i = 0; i < hits.length;
i++) {
> > > > >> > + int
> > > > >> > docId = hits[i].doc; Document d = searcher.doc(docId);
> > > > >> > System.out.println("\"value is:\" " + d.get("cpn")); } if
> > > > >> > (hits.length == 0) { System.out.println("No Data Founds
"); }
> > > > >> >
> > > > >> >
> > > > >> > Thanks in advance.
> > > > >> >
> > > > >> > --
> > > > >> > Keep Smiling....
> > > > >> > Thanks & Regards
> > > > >> > Bhaskar.
> > > > >> > Mobile:9866724142
> > > > >>
> > > > >> -----------------------------------------------------------------
> > > > >> ---- To unsubscribe, e-mail:
> > > > >> java-user-unsubscribe@lucene.apache.org
> > > > >> For additional commands, e-mail: java-user-help@lucene.apache.org
> > > > >>
> > > > >>
> > > > >
> > > > >
> > > > > --
> > > > > Keep Smiling....
> > > > > Thanks & Regards
> > > > > Bhaskar.
> > > > > Mobile:9866724142
> > > >
> > > > --------------------------------------------------------------------
> > > > - To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
> >
> > --
> > Keep Smiling....
> > Thanks & Regards
> > Bhaskar.
> > Mobile:9866724142
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Keep Smiling....
Thanks & Regards
Bhaskar.
Mobile:9866724142

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message