lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bhaskar <bhaskar1...@gmail.com>
Subject Re: Need help in alphanumeric search
Date Wed, 30 Sep 2015 08:09:00 GMT
Hi Uwe,

Wav!!! Thanks a lot. I changed to StandardAnalyzer  it is working. Thank
you, thank you.

Regards,
Bhaskar

On Wed, Sep 30, 2015 at 12:23 PM, Uwe Schindler <uwe@thetaphi.de> wrote:

> Hi Bhaskar,
>
> the answer is very simple: Your analysis is not useful for the type of
> queries and data you are using. You are using SimpleAnalyzer in your
> search/indexing code:
>
>
> https://lucene.apache.org/core/5_3_1/analyzers-common/org/apache/lucene/analysis/core/SimpleAnalyzer.html
> "An Analyzer that filters LetterTokenizer with LowerCaseFilter"
>
> And LetterTokenizer does the following:
>
> https://lucene.apache.org/core/5_3_1/analyzers-common/org/apache/lucene/analysis/core/LetterTokenizer.html
> "A LetterTokenizer is a tokenizer that divides text at non-letters. That's
> to say, it defines tokens as maximal strings of adjacent letters, as
> defined by java.lang.Character.isLetter() predicate."
>
> So it creates a new token at every non-letter boundary. All non-letters
> are discarded (because they are treated as token boundary). So your queries
> can never match.
>
> I'd suggest to first inform yourself about analysis and choose a better
> one that suits your underlying data and the queries you want to do. Maybe
> use WhitespaceAnalyzer or better StandardAnalyzer as a first step. Be sure
> to reindex your data before querying. The Analyzer used on the search side
> must be the same like on the query side. If you want to use wildcards, you
> have to take care more, because wildcards are not really natural for "full
> text search engine" and may cause inconsistent results.
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
> > -----Original Message-----
> > From: Bhaskar [mailto:bhaskar1484@gmail.com]
> > Sent: Wednesday, September 30, 2015 4:28 AM
> > To: java-user@lucene.apache.org
> > Subject: Re: Need help in alphanumeric search
> >
> > Hi Uwe,
> >
> > Below is my indexing code:
> >
> > public static void main(String[] args) throws Exception { //Path
> indexDir =
> > new Path(INDEX_DIR); public static final String INDEX_DIR =
> "c:/DBIndexAll/";
> > final Path indexDir = Paths.get(INDEX_DIR); SimpleDBIndexer indexer = new
> > SimpleDBIndexer(); try{
> >    Class.forName(JDBC_DRIVER).newInstance();
> >    Connection conn = DriverManager.getConnection(CONNECTION_URL +
> > DBNAME, USER_NAME, PASSWORD);
> >    SimpleAnalyzer analyzer = new SimpleAnalyzer();
> >    IndexWriterConfig indexWriterConfig = new IndexWriterConfig(analyzer);
> >    IndexWriter indexWriter = new IndexWriter(FSDirectory.open(indexDir),
> > indexWriterConfig);
> >    System.out.println("Indexing to directory '" + indexDir + "'...");
> >    int indexedDocumentCount = indexer.indexDocs(indexWriter, conn);
> >    indexWriter.close();
> >    System.out.println(indexedDocumentCount + " records have been indexed
> > successfully"); } catch (Exception e) {
> >    e.printStackTrace();
> > }
> > }
> >
> > int indexDocs(IndexWriter writer, Connection conn) throws Exception {
> >   String sql = QUERY1;
> >   Statement stmt = conn.createStatement();
> >   ResultSet rs = stmt.executeQuery(sql);
> >   int i=0;
> >   while (rs.next()) {
> >      Document d = new Document();
> >      d.add(new TextField("cpn", rs.getString("cpn"), Field.Store.YES));
> >
> >      writer.addDocument(d);
> >      i++;
> >  }
> >   stmt.close();
> >   rs.close();
> >
> >   return i;
> > }
> >
> >
> > Searching code:
> >
> > public class SimpleDBSearcher {
> > // PLASTRON
> > private static final String LUCENE_QUERY = "SD*"; private static final
> int
> > MAX_HITS = 500; private static final String INDEX_DIR = "C:/DBIndexAll/";
> >
> > public static void main(String[] args) throws Exception { // File
> indexDir = new
> > File(SimpleDBIndexer.INDEX_DIR); final Path indexDir =
> > Paths.get(SimpleDBIndexer.INDEX_DIR);
> > String query = LUCENE_QUERY;
> > SimpleDBSearcher searcher = new SimpleDBSearcher();
> > searcher.searchIndex(indexDir, query); }
> >
> > private void searchIndex(Path indexDir, String queryStr) throws
> Exception {
> > Directory directory = FSDirectory.open(indexDir); System.out.println("The
> > query string is " + queryStr); MultiFieldQueryParser queryParser = new
> > MultiFieldQueryParser(new String[] { "cpn" }, new StandardAnalyzer());
> > IndexReader reader = DirectoryReader.open(directory); IndexSearcher
> > searcher = new IndexSearcher(reader);
> > queryParser.getAllowLeadingWildcard();
> >
> > Query query = queryParser.parse(queryStr); TopDocs topDocs =
> > searcher.search(query, MAX_HITS);
> >
> > ScoreDoc[] hits = topDocs.scoreDocs;
> > System.out.println(hits.length + " Record(s) Found"); for (int i = 0; i <
> > hits.length; i++) { int docId = hits[i].doc; Document d =
> searcher.doc(docId);
> > System.out.println("\"cpn value is:\" " + d.get("cpn")); } if
> (hits.length == 0) {
> > System.out.println("No Data Founds "); }
> >
> > }
> > }
> >
> >
> > Please help here, thanks in advance.
> >
> > Regards,
> > Bhaskar
> >
> > On Tue, Sep 29, 2015 at 3:47 AM, Uwe Schindler <uwe@thetaphi.de> wrote:
> >
> > > Hi Erick,
> > >
> > > This mail was in Lucene's user mailing list. This is not about Solr,
> > > so user cannot provide his Solr config! :-) In any case, it would be
> > > good to get the Analyzer + code you use while indexing and also the
> > > code (+ Analyzer) that creates the query while searching.
> > >
> > > Uwe
> > >
> > > -----
> > > Uwe Schindler
> > > H.-H.-Meier-Allee 63, D-28213 Bremen
> > > http://www.thetaphi.de
> > > eMail: uwe@thetaphi.de
> > >
> > >
> > > > -----Original Message-----
> > > > From: Erick Erickson [mailto:erickerickson@gmail.com]
> > > > Sent: Monday, September 28, 2015 6:01 PM
> > > > To: java-user
> > > > Subject: Re: Need help in alphanumeric search
> > > >
> > > > You need to supply the definitions of this field from your
> > > > schema.xml
> > > file,
> > > > both the <field> and <fieldType>
> > > >
> > > > Additionally, please provide the results of the query you're trying
> > > > with &debug=true appended.
> > > >
> > > > The adminUI/analysis page is very helpful in these situations as
> well.
> > > Select
> > > > the appropriate core from the drop-down on the left and you'll see
> > > > an "analysis"
> > > > section appear that shows you exactly what happens when the field is
> > > > analyzed.
> > > >
> > > > Best,
> > > > Erick
> > > >
> > > > On Mon, Sep 28, 2015 at 5:01 AM, Bhaskar <bhaskar1484@gmail.com>
> > wrote:
> > > > > Thanks Lan for reply.
> > > > >
> > > > > cpn values are like 123-0049, 342-043, ab23-090, hedwsdg
> > > > >
> > > > > my application is working when i gave search  for below inputs
> > > > > 1) ab*
> > > > >  2)hedwsdg
> > > > > 3) hed*
> > > > >
> > > > > but it is not working for
> > > > > 1) 123*
> > > > > 2) 123-0049
> > > > > 3) ab23*
> > > > >
> > > > >
> > > > > Note: if the search input has number then it is not working.
> > > > >
> > > > > Thanks in advacne.
> > > > >
> > > > >
> > > > > On Mon, Sep 28, 2015 at 3:49 PM, Ian Lea <ian.lea@gmail.com>
> wrote:
> > > > >
> > > > >> Hi
> > > > >>
> > > > >>
> > > > >> Can you provide a few examples of values of cpn that a) are and
> > > > >> b) are not being found, for indexing and searching.
> > > > >>
> > > > >> You may also find some of the tips at
> > > > >>
> > > > >> http://wiki.apache.org/lucene-
> > > > java/LuceneFAQ#Why_am_I_getting_no_hits
> > > > >> _.2F_incorrect_hits.3F
> > > > >> useful.
> > > > >>
> > > > >> You haven't shown the code that created the IndexWriter so the
> > > > >> tip about using the same analyzer at index and search time might
> > > > >> be relevant.
> > > > >>
> > > > >>
> > > > >>
> > > > >> --
> > > > >> Ian.
> > > > >>
> > > > >>
> > > > >> On Mon, Sep 28, 2015 at 10:49 AM, Bhaskar
> > <bhaskar1484@gmail.com>
> > > > wrote:
> > > > >> > Hi,
> > > > >> > I am beginner in Apache lucene, I am using 5.3.1.
> > > > >> > I have created  the index on the database result. The index
> > > > >> > values are having alphanumeric and strings values. I am
able to
> > > > >> > search the strings
> > > > >> but
> > > > >> > I am not able to search alphanumeric values.
> > > > >> >
> > > > >> > Can someone help me here.
> > > > >> >
> > > > >> > Below is indexing code...
> > > > >> >
> > > > >> > int indexDocs(IndexWriter writer, Connection conn) throws
> > > > >> > Exception { Statement stmt = conn.createStatement();
> > > > >> >   ResultSet rs = stmt.executeQuery(sql);
> > > > >> >   int i=0;
> > > > >> >   while (rs.next()) {
> > > > >> >      Document d = new Document();
> > > > >> >     // System.out.println("cpn is" + rs.getString("cpn"));
> > > > >> >     // System.out.println("mpn is" + rs.getString("mpn"));
> > > > >> >
> > > > >> >   d.add(new TextField("cpn", rs.getString("cpn"),
> > > > >> > Field.Store.YES));
> > > > >> >
> > > > >> >
> > > > >> >      writer.addDocument(d);
> > > > >> >      i++;
> > > > >> >  }
> > > > >> > }
> > > > >> >
> > > > >> > Searching code:
> > > > >> >
> > > > >> >
> > > > >> > private void searchIndex(Path indexDir, String queryStr)
throws
> > > > >> Exception {
> > > > >> > Directory directory = FSDirectory.open(indexDir);
> > > > >> > System.out.println("The query string is " + queryStr); //
> > > > >> > MultiFieldQueryParser queryParser = new
> > > > >> > MultiFieldQueryParser(new // String[] {"mpn"}, new
> > > > >> > StandardAnalyzer()); // IndexReader reader =
> > > > >> > IndexReader.open(directory); IndexReader reader =
> > > > >> > DirectoryReader.open(directory); IndexSearcher searcher
= new
> > > > >> > IndexSearcher(reader); Analyzer analyzer = new
> > > > >> > StandardAnalyzer(); analyzer.tokenStream("cpn", queryStr);
> > > > >> > QueryParser parser = new QueryParser("cpn", analyzer);
> > > > >> > parser.setDefaultOperator(Operator.OR);
> > > > >> > parser.getAllowLeadingWildcard();
> > > > >> > parser.setAutoGeneratePhraseQueries(true);
> > > > >> > Query query = parser.parse(queryStr); searcher.search(query,
> > > > >> > 100); TopDocs topDocs = searcher.search(query, MAX_HITS);
> > > > >> >
> > > > >> > ScoreDoc[] hits = topDocs.scoreDocs;
> > > > >> > System.out.println(hits.length
> > > > >> > + " Record(s) Found"); for (int i = 0; i < hits.length;
i++) {
> > > > >> > + int
> > > > >> > docId = hits[i].doc; Document d = searcher.doc(docId);
> > > > >> > System.out.println("\"value is:\" " + d.get("cpn")); } if
> > > > >> > (hits.length == 0) { System.out.println("No Data Founds
"); }
> > > > >> >
> > > > >> >
> > > > >> > Thanks in advance.
> > > > >> >
> > > > >> > --
> > > > >> > Keep Smiling....
> > > > >> > Thanks & Regards
> > > > >> > Bhaskar.
> > > > >> > Mobile:9866724142
> > > > >>
> > > > >> -----------------------------------------------------------------
> > > > >> ---- To unsubscribe, e-mail:
> > > > >> java-user-unsubscribe@lucene.apache.org
> > > > >> For additional commands, e-mail: java-user-help@lucene.apache.org
> > > > >>
> > > > >>
> > > > >
> > > > >
> > > > > --
> > > > > Keep Smiling....
> > > > > Thanks & Regards
> > > > > Bhaskar.
> > > > > Mobile:9866724142
> > > >
> > > > --------------------------------------------------------------------
> > > > - To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
> >
> > --
> > Keep Smiling....
> > Thanks & Regards
> > Bhaskar.
> > Mobile:9866724142
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Keep Smiling....
Thanks & Regards
Bhaskar.
Mobile:9866724142

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message