lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <ian....@gmail.com>
Subject Re: Need help in alphanumeric search
Date Thu, 01 Oct 2015 13:36:41 GMT
Take a look at http://lucene.apache.org/core/5_3_1/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#package_description.
Sounds like you want an AND, or a +, or both. You may also want to
take a look at phrase queries and/or span queries.


--
Ian.



--
Ian.


On Thu, Oct 1, 2015 at 1:52 PM, Bhaskar <bhaskar1484@gmail.com> wrote:
> Hi Uwe,
> my searching is working like this.
> if i give input as "SD RAM Bhaskar" then which ever strings are having
> "SD", "RAM", "Bhaskar" all results are coming .
> i.e. "SD lib"
>       "RAM hello"
>       "hi Bhaskar "
>       "Bhaskar hai SD"
>
>
> But I want below output.
>        "SD RAM Bhaskar"
>        "SD RAM Bhaskar hello"
> i.e in the begining string have "SD RAM Bhaskar"  then next string can be
> any thing.
>
>
> but my current application result where ever it finds the "SD", or "RAM",
> or "Bhaskar" I am getting all the string.
>
>
> Can you please advice?
> Thanks a lot in advance.
>
> Regards,
> Bhaskar
>
>
>
>
> On Wed, Sep 30, 2015 at 12:23 PM, Uwe Schindler <uwe@thetaphi.de> wrote:
>
>> Hi Bhaskar,
>>
>> the answer is very simple: Your analysis is not useful for the type of
>> queries and data you are using. You are using SimpleAnalyzer in your
>> search/indexing code:
>>
>>
>> https://lucene.apache.org/core/5_3_1/analyzers-common/org/apache/lucene/analysis/core/SimpleAnalyzer.html
>> "An Analyzer that filters LetterTokenizer with LowerCaseFilter"
>>
>> And LetterTokenizer does the following:
>>
>> https://lucene.apache.org/core/5_3_1/analyzers-common/org/apache/lucene/analysis/core/LetterTokenizer.html
>> "A LetterTokenizer is a tokenizer that divides text at non-letters. That's
>> to say, it defines tokens as maximal strings of adjacent letters, as
>> defined by java.lang.Character.isLetter() predicate."
>>
>> So it creates a new token at every non-letter boundary. All non-letters
>> are discarded (because they are treated as token boundary). So your queries
>> can never match.
>>
>> I'd suggest to first inform yourself about analysis and choose a better
>> one that suits your underlying data and the queries you want to do. Maybe
>> use WhitespaceAnalyzer or better StandardAnalyzer as a first step. Be sure
>> to reindex your data before querying. The Analyzer used on the search side
>> must be the same like on the query side. If you want to use wildcards, you
>> have to take care more, because wildcards are not really natural for "full
>> text search engine" and may cause inconsistent results.
>>
>> Uwe
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>>
>> > -----Original Message-----
>> > From: Bhaskar [mailto:bhaskar1484@gmail.com]
>> > Sent: Wednesday, September 30, 2015 4:28 AM
>> > To: java-user@lucene.apache.org
>> > Subject: Re: Need help in alphanumeric search
>> >
>> > Hi Uwe,
>> >
>> > Below is my indexing code:
>> >
>> > public static void main(String[] args) throws Exception { //Path
>> indexDir =
>> > new Path(INDEX_DIR); public static final String INDEX_DIR =
>> "c:/DBIndexAll/";
>> > final Path indexDir = Paths.get(INDEX_DIR); SimpleDBIndexer indexer = new
>> > SimpleDBIndexer(); try{
>> >    Class.forName(JDBC_DRIVER).newInstance();
>> >    Connection conn = DriverManager.getConnection(CONNECTION_URL +
>> > DBNAME, USER_NAME, PASSWORD);
>> >    SimpleAnalyzer analyzer = new SimpleAnalyzer();
>> >    IndexWriterConfig indexWriterConfig = new IndexWriterConfig(analyzer);
>> >    IndexWriter indexWriter = new IndexWriter(FSDirectory.open(indexDir),
>> > indexWriterConfig);
>> >    System.out.println("Indexing to directory '" + indexDir + "'...");
>> >    int indexedDocumentCount = indexer.indexDocs(indexWriter, conn);
>> >    indexWriter.close();
>> >    System.out.println(indexedDocumentCount + " records have been indexed
>> > successfully"); } catch (Exception e) {
>> >    e.printStackTrace();
>> > }
>> > }
>> >
>> > int indexDocs(IndexWriter writer, Connection conn) throws Exception {
>> >   String sql = QUERY1;
>> >   Statement stmt = conn.createStatement();
>> >   ResultSet rs = stmt.executeQuery(sql);
>> >   int i=0;
>> >   while (rs.next()) {
>> >      Document d = new Document();
>> >      d.add(new TextField("cpn", rs.getString("cpn"), Field.Store.YES));
>> >
>> >      writer.addDocument(d);
>> >      i++;
>> >  }
>> >   stmt.close();
>> >   rs.close();
>> >
>> >   return i;
>> > }
>> >
>> >
>> > Searching code:
>> >
>> > public class SimpleDBSearcher {
>> > // PLASTRON
>> > private static final String LUCENE_QUERY = "SD*"; private static final
>> int
>> > MAX_HITS = 500; private static final String INDEX_DIR = "C:/DBIndexAll/";
>> >
>> > public static void main(String[] args) throws Exception { // File
>> indexDir = new
>> > File(SimpleDBIndexer.INDEX_DIR); final Path indexDir =
>> > Paths.get(SimpleDBIndexer.INDEX_DIR);
>> > String query = LUCENE_QUERY;
>> > SimpleDBSearcher searcher = new SimpleDBSearcher();
>> > searcher.searchIndex(indexDir, query); }
>> >
>> > private void searchIndex(Path indexDir, String queryStr) throws
>> Exception {
>> > Directory directory = FSDirectory.open(indexDir); System.out.println("The
>> > query string is " + queryStr); MultiFieldQueryParser queryParser = new
>> > MultiFieldQueryParser(new String[] { "cpn" }, new StandardAnalyzer());
>> > IndexReader reader = DirectoryReader.open(directory); IndexSearcher
>> > searcher = new IndexSearcher(reader);
>> > queryParser.getAllowLeadingWildcard();
>> >
>> > Query query = queryParser.parse(queryStr); TopDocs topDocs =
>> > searcher.search(query, MAX_HITS);
>> >
>> > ScoreDoc[] hits = topDocs.scoreDocs;
>> > System.out.println(hits.length + " Record(s) Found"); for (int i = 0; i <
>> > hits.length; i++) { int docId = hits[i].doc; Document d =
>> searcher.doc(docId);
>> > System.out.println("\"cpn value is:\" " + d.get("cpn")); } if
>> (hits.length == 0) {
>> > System.out.println("No Data Founds "); }
>> >
>> > }
>> > }
>> >
>> >
>> > Please help here, thanks in advance.
>> >
>> > Regards,
>> > Bhaskar
>> >
>> > On Tue, Sep 29, 2015 at 3:47 AM, Uwe Schindler <uwe@thetaphi.de> wrote:
>> >
>> > > Hi Erick,
>> > >
>> > > This mail was in Lucene's user mailing list. This is not about Solr,
>> > > so user cannot provide his Solr config! :-) In any case, it would be
>> > > good to get the Analyzer + code you use while indexing and also the
>> > > code (+ Analyzer) that creates the query while searching.
>> > >
>> > > Uwe
>> > >
>> > > -----
>> > > Uwe Schindler
>> > > H.-H.-Meier-Allee 63, D-28213 Bremen
>> > > http://www.thetaphi.de
>> > > eMail: uwe@thetaphi.de
>> > >
>> > >
>> > > > -----Original Message-----
>> > > > From: Erick Erickson [mailto:erickerickson@gmail.com]
>> > > > Sent: Monday, September 28, 2015 6:01 PM
>> > > > To: java-user
>> > > > Subject: Re: Need help in alphanumeric search
>> > > >
>> > > > You need to supply the definitions of this field from your
>> > > > schema.xml
>> > > file,
>> > > > both the <field> and <fieldType>
>> > > >
>> > > > Additionally, please provide the results of the query you're trying
>> > > > with &debug=true appended.
>> > > >
>> > > > The adminUI/analysis page is very helpful in these situations as
>> well.
>> > > Select
>> > > > the appropriate core from the drop-down on the left and you'll see
>> > > > an "analysis"
>> > > > section appear that shows you exactly what happens when the field
is
>> > > > analyzed.
>> > > >
>> > > > Best,
>> > > > Erick
>> > > >
>> > > > On Mon, Sep 28, 2015 at 5:01 AM, Bhaskar <bhaskar1484@gmail.com>
>> > wrote:
>> > > > > Thanks Lan for reply.
>> > > > >
>> > > > > cpn values are like 123-0049, 342-043, ab23-090, hedwsdg
>> > > > >
>> > > > > my application is working when i gave search  for below inputs
>> > > > > 1) ab*
>> > > > >  2)hedwsdg
>> > > > > 3) hed*
>> > > > >
>> > > > > but it is not working for
>> > > > > 1) 123*
>> > > > > 2) 123-0049
>> > > > > 3) ab23*
>> > > > >
>> > > > >
>> > > > > Note: if the search input has number then it is not working.
>> > > > >
>> > > > > Thanks in advacne.
>> > > > >
>> > > > >
>> > > > > On Mon, Sep 28, 2015 at 3:49 PM, Ian Lea <ian.lea@gmail.com>
>> wrote:
>> > > > >
>> > > > >> Hi
>> > > > >>
>> > > > >>
>> > > > >> Can you provide a few examples of values of cpn that a) are
and
>> > > > >> b) are not being found, for indexing and searching.
>> > > > >>
>> > > > >> You may also find some of the tips at
>> > > > >>
>> > > > >> http://wiki.apache.org/lucene-
>> > > > java/LuceneFAQ#Why_am_I_getting_no_hits
>> > > > >> _.2F_incorrect_hits.3F
>> > > > >> useful.
>> > > > >>
>> > > > >> You haven't shown the code that created the IndexWriter so
the
>> > > > >> tip about using the same analyzer at index and search time
might
>> > > > >> be relevant.
>> > > > >>
>> > > > >>
>> > > > >>
>> > > > >> --
>> > > > >> Ian.
>> > > > >>
>> > > > >>
>> > > > >> On Mon, Sep 28, 2015 at 10:49 AM, Bhaskar
>> > <bhaskar1484@gmail.com>
>> > > > wrote:
>> > > > >> > Hi,
>> > > > >> > I am beginner in Apache lucene, I am using 5.3.1.
>> > > > >> > I have created  the index on the database result. The
index
>> > > > >> > values are having alphanumeric and strings values. I
am able to
>> > > > >> > search the strings
>> > > > >> but
>> > > > >> > I am not able to search alphanumeric values.
>> > > > >> >
>> > > > >> > Can someone help me here.
>> > > > >> >
>> > > > >> > Below is indexing code...
>> > > > >> >
>> > > > >> > int indexDocs(IndexWriter writer, Connection conn) throws
>> > > > >> > Exception { Statement stmt = conn.createStatement();
>> > > > >> >   ResultSet rs = stmt.executeQuery(sql);
>> > > > >> >   int i=0;
>> > > > >> >   while (rs.next()) {
>> > > > >> >      Document d = new Document();
>> > > > >> >     // System.out.println("cpn is" + rs.getString("cpn"));
>> > > > >> >     // System.out.println("mpn is" + rs.getString("mpn"));
>> > > > >> >
>> > > > >> >   d.add(new TextField("cpn", rs.getString("cpn"),
>> > > > >> > Field.Store.YES));
>> > > > >> >
>> > > > >> >
>> > > > >> >      writer.addDocument(d);
>> > > > >> >      i++;
>> > > > >> >  }
>> > > > >> > }
>> > > > >> >
>> > > > >> > Searching code:
>> > > > >> >
>> > > > >> >
>> > > > >> > private void searchIndex(Path indexDir, String queryStr)
throws
>> > > > >> Exception {
>> > > > >> > Directory directory = FSDirectory.open(indexDir);
>> > > > >> > System.out.println("The query string is " + queryStr);
//
>> > > > >> > MultiFieldQueryParser queryParser = new
>> > > > >> > MultiFieldQueryParser(new // String[] {"mpn"}, new
>> > > > >> > StandardAnalyzer()); // IndexReader reader =
>> > > > >> > IndexReader.open(directory); IndexReader reader =
>> > > > >> > DirectoryReader.open(directory); IndexSearcher searcher
= new
>> > > > >> > IndexSearcher(reader); Analyzer analyzer = new
>> > > > >> > StandardAnalyzer(); analyzer.tokenStream("cpn", queryStr);
>> > > > >> > QueryParser parser = new QueryParser("cpn", analyzer);
>> > > > >> > parser.setDefaultOperator(Operator.OR);
>> > > > >> > parser.getAllowLeadingWildcard();
>> > > > >> > parser.setAutoGeneratePhraseQueries(true);
>> > > > >> > Query query = parser.parse(queryStr); searcher.search(query,
>> > > > >> > 100); TopDocs topDocs = searcher.search(query, MAX_HITS);
>> > > > >> >
>> > > > >> > ScoreDoc[] hits = topDocs.scoreDocs;
>> > > > >> > System.out.println(hits.length
>> > > > >> > + " Record(s) Found"); for (int i = 0; i < hits.length;
i++) {
>> > > > >> > + int
>> > > > >> > docId = hits[i].doc; Document d = searcher.doc(docId);
>> > > > >> > System.out.println("\"value is:\" " + d.get("cpn"));
} if
>> > > > >> > (hits.length == 0) { System.out.println("No Data Founds
"); }
>> > > > >> >
>> > > > >> >
>> > > > >> > Thanks in advance.
>> > > > >> >
>> > > > >> > --
>> > > > >> > Keep Smiling....
>> > > > >> > Thanks & Regards
>> > > > >> > Bhaskar.
>> > > > >> > Mobile:9866724142
>> > > > >>
>> > > > >> -----------------------------------------------------------------
>> > > > >> ---- To unsubscribe, e-mail:
>> > > > >> java-user-unsubscribe@lucene.apache.org
>> > > > >> For additional commands, e-mail: java-user-help@lucene.apache.org
>> > > > >>
>> > > > >>
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Keep Smiling....
>> > > > > Thanks & Regards
>> > > > > Bhaskar.
>> > > > > Mobile:9866724142
>> > > >
>> > > > --------------------------------------------------------------------
>> > > > - To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
>> > >
>> > >
>> > > ---------------------------------------------------------------------
>> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > > For additional commands, e-mail: java-user-help@lucene.apache.org
>> > >
>> > >
>> >
>> >
>> > --
>> > Keep Smiling....
>> > Thanks & Regards
>> > Bhaskar.
>> > Mobile:9866724142
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
>
> --
> Keep Smiling....
> Thanks & Regards
> Bhaskar.
> Mobile:9866724142

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message