lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manjula Wijewickrema <manjul...@gmail.com>
Subject Re: ShingleAnalyzerWrapper question
Date Tue, 17 Jun 2014 04:22:04 GMT
Dear Steve,

It works. Thanks.




On Wed, Jun 11, 2014 at 6:18 PM, Steve Rowe <sarowe@gmail.com> wrote:

> You should give sw rather than analyzer in the IndexWriter actor.
>
> Steve
> www.lucidworks.com
>  On Jun 11, 2014 2:24 AM, "Manjula Wijewickrema" <manjula53@gmail.com>
> wrote:
>
> > Hi,
> >
> > In my programme, I can index and search a document based on unigrams. I
> > modified the code as follows to obtain the results based on bigrams.
> > However, it did not give me the desired output.
> >
> > *****************
> >
> > *public* *static* *void* createIndex() *throws* CorruptIndexException,
> > LockObtainFailedException,
> >
> >
> >
> > IOException {
> >
> >
> >
> >
> >
> >             *final* String[] NEW_STOP_WORDS = {"a", "able", "about",
> > "actually", "after", "allow", "almost", "already", "also", "although",
> > "always", "am",   "an", "and", "any", "anybody"};  //only a portion
> >
> >
> >
> >             SnowballAnalyzer analyzer = *new* SnowballAnalyzer("English",
> > NEW_STOP_WORDS );
> >
> >             Directory directory =
> > FSDirectory.getDirectory(*INDEX_DIRECTORY*
> > );
> >
> >
> >
> >             ShingleAnalyzerWrapper sw=*new*
> > ShingleAnalyzerWrapper(analyzer,2);
> >
> >             sw.setOutputUnigrams(*false*);
> >
> >
> >
> >             IndexWriter w= *new* IndexWriter(*INDEX_DIRECTORY*, analyzer,
> > *true*,IndexWriter.MaxFieldLength.*UNLIMITED*);
> >
> >             File dir = *new* File(*FILES_TO_INDEX_DIRECTORY*);
> >
> >             File[] files = dir.listFiles();
> >
> >
> >
> >
> >
> >             *for* (File file : files) {
> >
> >
> >
> >                   Document doc = *new* Document();
> >
> >                   String text="";
> >
> >                   doc.add(*new* Field("contents",text,Field.Store.*YES*,
> > Field.Index.UN_TOKENIZED,Field.TermVector.*YES*));
> >
> >
> >
> >
> >
> >                   Reader reader = *new* FileReader(file);
> >
> >                   doc.add(*new* Field(*FIELD_CONTENTS*, reader));
> >
> >                   w.addDocument(doc);
> >
> >             }
> >
> >             w.optimize();
> >
> >             w.close();
> >
> >
> >
> >       }
> >
> >
> > ****************
> >
> > Still the output is;
> >
> >
> > {contents: /1, assist/1, fine/1, librari/1, librarian/1, main/1,
> manjula/3,
> > name/1, sabaragamuwa/1, univers/1}
> >
> > *******************
> >
> >
> > If anybody can, please help me to obtain the correct output.
> >
> >
> > Thanks,
> >
> >
> > Manjula.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message