Hi S Vinoth, 


Issue 1: severity is 1, 2 or 3 depending on the analyst discretion. 

1 => for anomaly or threat.

2 => medium or not sure

3 => not a problem.

We take 3 from the feedback file, then we inject those records so they don't look "suspicious" to the LDA training.

18 Fields are the results from the analyst work in Spot UI, so yes, they are correct.


Issue 2

This should have been merged some time ago but it haven't. Please review this read me file meanwhile: https://github.com/mpereaji/incubator-spot/blob/Spot-Schemas/spot-setup/APACHE-SPOT-SCHEMA.md 

If dns_a is empty and you get an error and it's marked as not required in the markdown file I indicated above, please raise a new issue with that requirement so we can determine if it's a problem or we should update documentation.


Let me know if any question remains open. I'll work on getting the .md file in master branch.



On Tue, Jan 23, 2018 at 11:07 PM, Vinoth S <weknowth59@gmail.com> wrote:

Hi Team,

Please refer below link for my issue.

I am executing spot-ml alone for my exploration. Need help or few understanding in DNS table values.

Here my queries/Issues:

(Issue 1) I need to know what fields need to be placed in ml_feedbck.csv. Please share some sample file for dns-feedback.csv.
From https://github.com/apache/incubator-spot/blob/master/spot-ml/src/main/scala/org/apache/spot/dns/model/DNSFeedback.scala

I have found 18 parameters required in ml_feedbck.csv. Is it correct? 
What value need to put in dns_sev field/column?


(Issue 2) What fields can be empty in DNS table?

(Issue 2.1) what will happen if I keep dns_a column value is empty? 
When I was loading data in DNS table, sometime dns_a would be empty. If any null or empty values in this field, then my ML has been failed.
So I have followed below t-shark command.

tshark.exe -r traffic_spot_00000_20180123100402.pcap -E separator=, -E header=y -E occurrence=f -T fields -e frame.time -e frame.time_epoch -e frame.len -e ip.src -e ip.dst -e dns.resp.name -e dns.resp.type -e dns.resp.class -e dns.flags.rcode -e dns.a "(dns.flags.response==1) and (dns.a)" > traffic_spot_windows.csv

Problem with above command is ‘it has been executed in windows’. 
Is it anyone give me equivalent Tshark command for Linux/cent-os?

(Issue 2.2) what is the expected value in frame_time column?
My actual value from pcap file is 23-Jan 2018 15:34:16.242978980 India Standard Time. While executing it has been failed.
Then I have modified manually from 23-Jan 2018 15:34:16.242978980 India Standard Time to Jan 23 2018 15:34:16.242978980 IST. 
Then ML executed successful. Is it any bug?