lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmet Arslan <iori...@yahoo.com.INVALID>
Subject Re: indexing dovecot mailbox
Date Sun, 22 May 2016 13:22:50 GMT
Hi Andreas,

Exactly, SimplePostTool does not recognize/support the file-ending.

If they are text files, you can change file exception to *.txt, post tool will grab them.

If you have some code to read those files, you can use SolrJ to roll your own indexer
https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/

Sorry I am not familiar with these e-mail staff, may be Apache Tika can read/recognize these
mail files.

Ahmet



On Sunday, May 22, 2016 1:14 PM, Andreas Meyer <a.meyer@nimmini.de> wrote:
Hello!

The files I want to index are IMAP-folders of dovecot, Maildir.

bitmachine1:/home/a.meyer/Postfach/cur # file 1461583672.Vfe03I1000f4M981621.bitmachine1:2,S
1461583672.Vfe03I1000f4M981621.bitmachine1:2,S: SMTP mail, ASCII text

I can read them with the Midnight Commeander. Has it something to do
with the file-ending not recognized?

Andreas


Ahmet Arslan <iorixxx@yahoo.com.INVALID> schrieb am 22.05.16 um 00:46:32 Uhr:

> Hi Meyer,
> 
> Not sure what "mailbox of dovecot" is, but SimplePostTool can recognize certain file
types.
> They (xml,json,...,log) are actually listed in the log msg in your email.
> 
> Can you describe the format of the files that you want to index?
> Are they text files?
> 
> ahmet
> 
> 
> 
> On Sunday, May 22, 2016 1:16 AM, Andreas Meyer <a.meyer@nimmini.de> wrote:
> Hello!
> 
> Bear with me, I am new to solr and everything is very
> complex. Don't know how the thing is working.
> 
> I installed solr-5.5.1.tgz and got it running. Try to
> index a mailbox of dovecot with
> 
> # bin/post -c myfiles /home/a.meyer/Postfach
> 
> after I copied solr-schema.xml to /opt/solr/server/solr/myfiles/conf
> as schema.xml, but no files other than dovecot.index.log and dovecot.mailbox.log
> are indexed.
> 
> # bin/post -c myfiles /home/a.meyer/Postfach
> /usr/lib64/jvm/jre/bin/java -classpath /opt/solr/dist/solr-core-5.5.1.jar -Dauto=yes
-Dc=myfiles -Ddata=files -Drecursive=yes org.apache.solr.util.SimplePostTool /home/a.meyer/Postfach
> SimplePostTool version 5.0.0
> Posting files to [base] url http://localhost:8983/solr/myfiles/update...
> Entering auto mode. File endings considered are xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
> Entering recursive mode, max depth=999, delay=0s
> Indexing directory /home/a.meyer/Postfach (2 files, depth=0)
> POSTing file dovecot.index.log (text/plain) to [base]/extract
> POSTing file dovecot.mailbox.log (text/plain) to [base]/extract
> Indexing directory /home/a.meyer/Postfach/cur (0 files, depth=1)
> Indexing directory /home/a.meyer/Postfach/new (0 files, depth=1)
> Indexing directory /home/a.meyer/Postfach/tmp (0 files, depth=1)
> 2 files indexed.
> COMMITting Solr index changes to http://localhost:8983/solr/myfiles/update...
> Time spent: 0:00:02.976
> 
> I was hoping the post command would index the email in /home/a.meyer/Postfach/cur,
> but it doesn't. The content of this folder looks like this:
> 
> -rw------- 1 a.meyer users   4764 25. Apr 13:27 1461583672.Vfe03I1000f4M981621.bitmachine1:2,S
> -rw------- 1 a.meyer users 276318 26. Apr 17:48 1461685694.Vfe03I1000f6M202284.bitmachine1:2,S
> -rw------- 1 a.meyer users   4578 27. Apr 17:16 1461770179.Vfe03I10010aM756286.bitmachine1:2,S
> -rw------- 1 a.meyer users  16981  3. Mai 10:12 1462263159.Vfe03I1000c5M811118.bitmachine1:2,RS
> 
> What did I miss? Could need some help with this one.
> 
> Kind regards
> 
>   Andreas

Mime
View raw message