lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Szymon Sutek <dagothvers...@gmail.com>
Subject Re: Unable to retrieve OffsetTermVector for given term using Apache Lucene 6
Date Fri, 02 Dec 2016 09:54:47 GMT
I made a mistake in last part of code. It should be:

while((byteRef = iterator.next()) != null) {
    String term = byteRef.utf8ToString();
    //Here I would like to retrieve all offset postions for given term variable

}


2016-12-02 10:08 GMT+01:00 Szymon Sutek <dagothversur2@gmail.com>:

> Hello, I am trying to index a txt file and then retrieve it's terms offset
> positions. Unfortunately I can only get only one offset information per
> term, not all of it(if it occured more than once while indexing) Here are
> most important parts of the code:
>
> FieldType used while indexing.
>
> private FieldType getFieldType(){
>     FieldType fieldType = new FieldType();
>
>     fieldType.setTokenized(true);
>     fieldType.setStoreTermVectors(true);
>     fieldType.setStoreTermVectorPositions(true);
>     fieldType.setStoreTermVectorOffsets(true);
>     fieldType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
>
>     return fieldType;
> }
>
> After succesfully creating index, I am using indexReader to read terms.
> and iterate through all of them but I have no idea how to collect theirs offsets.
>
> In earlier versions I would cast to needed vector from TermVector and get offset List
for a concrete term value. Now I stuck on this part of code:
>
>
> Terms terms =  indexReader.getTermVector(0,"text");
> TermsEnum iterator  = terms.iterator();
>
> BytesRef byteRef = null;
>
> while((byteRef = iterator.next()) != null) {
>     String term = byteRef.utf8ToString();
>     if (p.matcher(term).matches())
>         searchResult.put(1, term);
>
>     System.out.println("[S]:" + term);
> }
>
> I would be grateful for any help!
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message