Various detection solutions are proposed and many intrusion detection tools attempt to identify DDoS attacks mostly through anomaly detection, i.e. Identification of deviations from normal. Its Flowmon Anomaly Detection System (ADS) is a powerful tool trusted by CISO and security engineers globally providing them with dominance over modern cyber threats. The solution utilizes sophisticated algorithms and machine learning to automatically identify network anomalies and risks that bypass traditional solutions such as firewall, IDS/IPS or antivirus.
Is anyone aware of any open source codes for Netflow Anomaly detection for DDOS and tunneling? I am a newbie in this area . I did find very few on github but anyone who has more experience with it, do advise.
Just want to try a few to understand how they work so language of python or r or c++ language are fine
venuvenu
1 Answer
There's some great resources around for ingesting the various flow formats. The harder part is in doing anomaly detection. You could consider 'R', see for instance: http://www.ojscurity.com/2014/10/r-netflow-analytics-i.html
When trying to detect tunneling you will need to establish one or more metrics that you can use to 'profile' the traffic. Typically this would be on a per-endpoint, per-protocol basis. For instance, HTTPS traffic to Amazon looks different than watching NetFlix content. The metrics you establish should enable you detect a chance in the typical pattern for a given type of traffic.
So it might be hard to detect HTTP traffic tunneled over HTTPS by using just flow data. However, tunneling HTTP traffic over DNS should be fairly easy to detect due to the different volumetric and session timing characteristics of each protocol.
DDoS is more straightforward, and can be detected by a volumetric 'baseline', since typical attacks are extremely loud in nature. Although, the more specific you get in terms of protocol, and type of packet, the faster and more accurate your DDoS detection will be.
Finally, the more you 'know' about the network you are monitoring, the better you are able to pick up anomalies. There are some obvious first-principles here, as DDoS attacks are loud, and most protocols have fairly well-known volume/timing characteristics, but learning what is typical for your network is the best way to reduce false positives.
Vince BerkVince Berk
Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.
Not the answer you're looking for? Browse other questions tagged ddostunnelingnetflowanomaly-detection or ask your own question.
Machine learning and data mining |
---|
|
|
|
|
In data mining, anomaly detection (also outlier detection[1]) is the identification of rare items, events or observations which raise suspicions by differing significantly from the majority of the data.[1] Typically the anomalous items will translate to some kind of problem such as bank fraud, a structural defect, medical problems or errors in a text. Anomalies are also referred to as outliers, novelties, noise, deviations and exceptions.[2]
In particular, in the context of abuse and network intrusion detection, the interesting objects are often not rare objects, but unexpected bursts in activity. This pattern does not adhere to the common statistical definition of an outlier as a rare object, and many outlier detection methods (in particular unsupervised methods) will fail on such data, unless it has been aggregated appropriately. Instead, a cluster analysis algorithm may be able to detect the micro clusters formed by these patterns.[3]
Three broad categories of anomaly detection techniques exist.[4]Unsupervised anomaly detection techniques detect anomalies in an unlabeled test data set under the assumption that the majority of the instances in the data set are normal by looking for instances that seem to fit least to the remainder of the data set. Supervised anomaly detection techniques require a data set that has been labeled as 'normal' and 'abnormal' and involves training a classifier (the key difference to many other statistical classification problems is the inherent unbalanced nature of outlier detection). Semi-supervised anomaly detection techniques construct a model representing normal behavior from a given normal training data set, and then test the likelihood of a test instance to be generated by the learnt model.
Applications[edit]
Anomaly detection is applicable in a variety of domains, such as intrusion detection, fraud detection, fault detection, system health monitoring, event detection in sensor networks, and detecting ecosystem disturbances. It is often used in preprocessing to remove anomalous data from the dataset. In supervised learning, removing the anomalous data from the dataset often results in a statistically significant increase in accuracy.[5][6]
Popular techniques[edit]
Several anomaly detection techniques have been proposed in literature.[7] Some of the popular techniques are:
- Density-based techniques (k-nearest neighbor,[8][9][10]local outlier factor,[11] isolation forests,[12] and many more variations of this concept[13]).
- Subspace-,[14] correlation-based[15] and tensor-based [16] outlier detection for high-dimensional data.[17]
- One-class support vector machines.[18]
- Replicator neural networks.[19], Autoencoders
- Bayesian Networks.[19]
- Hidden Markov models (HMMs).[19]
- Cluster analysis-based outlier detection.[20][21]
- Deviations from association rules and frequent itemsets.
- Fuzzy logic-based outlier detection.
- Ensemble techniques, using feature bagging,[22][23] score normalization[24][25] and different sources of diversity.[26][27]
The performance of different methods depends a lot on the data set and parameters, and methods have little systematic advantages over another when compared across many data sets and parameters.[28][29]
Application to data security[edit]
Anomaly detection was proposed for intrusion detection systems (IDS) by Dorothy Denning in 1986.[30] Anomaly detection for IDS is normally accomplished with thresholds and statistics, but can also be done with soft computing, and inductive learning.[31] Types of statistics proposed by 1999 included profiles of users, workstations, networks, remote hosts, groups of users, and programs based on frequencies, means, variances, covariances, and standard deviations.[32] The counterpart of anomaly detection in intrusion detection is misuse detection.
Software[edit]
- ELKI is an open-source Java data mining toolkit that contains several anomaly detection algorithms, as well as index acceleration for them.
Datasets[edit]
![Detection Detection](/uploads/1/2/5/7/125742119/215751060.png)
- Anomaly detection benchmark data repository of the Ludwig-Maximilians-Universität München; Mirror at University of São Paulo.
- ODDS – ODDS: A large collection of publicly available outlier detection datasets with ground truth in different domains.
See also[edit]
References[edit]
- ^ abZimek, Arthur; Schubert, Erich (2017), 'Outlier Detection', Encyclopedia of Database Systems, Springer New York, pp. 1–5, doi:10.1007/978-1-4899-7993-3_80719-1, ISBN9781489979933
- ^Hodge, V. J.; Austin, J. (2004). 'A Survey of Outlier Detection Methodologies'(PDF). Artificial Intelligence Review. 22 (2): 85–126. CiteSeerX10.1.1.318.4023. doi:10.1007/s10462-004-4304-y.
- ^Dokas, Paul; Ertoz, Levent; Kumar, Vipin; Lazarevic, Aleksandar; Srivastava, Jaideep; Tan, Pang-Ning (2002). 'Data mining for network intrusion detection'(PDF). Proceedings NSF Workshop on Next Generation Data Mining.
- ^Chandola, V.; Banerjee, A.; Kumar, V. (2009). 'Anomaly detection: A survey'. ACM Computing Surveys. 41 (3): 1–58. doi:10.1145/1541880.1541882.
- ^Tomek, Ivan (1976). 'An Experiment with the Edited Nearest-Neighbor Rule'. IEEE Transactions on Systems, Man, and Cybernetics. 6 (6): 448–452. doi:10.1109/TSMC.1976.4309523.
- ^Smith, M. R.; Martinez, T. (2011). 'Improving classification accuracy by identifying and removing instances that should be misclassified'(PDF). The 2011 International Joint Conference on Neural Networks. p. 2690. CiteSeerX10.1.1.221.1371. doi:10.1109/IJCNN.2011.6033571. ISBN978-1-4244-9635-8.
- ^Zimek, Arthur; Filzmoser, Peter (2018). 'There and back again: Outlier detection between statistical reasoning and data mining algorithms'. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 8 (6): e1280. doi:10.1002/widm.1280. ISSN1942-4787.
- ^Knorr, E. M.; Ng, R. T.; Tucakov, V. (2000). 'Distance-based outliers: Algorithms and applications'. The VLDB Journal the International Journal on Very Large Data Bases. 8 (3–4): 237–253. CiteSeerX10.1.1.43.1842. doi:10.1007/s007780050006.
- ^Ramaswamy, S.; Rastogi, R.; Shim, K. (2000). Efficient algorithms for mining outliers from large data sets. Proceedings of the 2000 ACM SIGMOD international conference on Management of data – SIGMOD '00. p. 427. doi:10.1145/342009.335437. ISBN1-58113-217-4.
- ^Angiulli, F.; Pizzuti, C. (2002). Fast Outlier Detection in High Dimensional Spaces. Principles of Data Mining and Knowledge Discovery. Lecture Notes in Computer Science. 2431. p. 15. doi:10.1007/3-540-45681-3_2. ISBN978-3-540-44037-6.
- ^Breunig, M. M.; Kriegel, H.-P.; Ng, R. T.; Sander, J. (2000). LOF: Identifying Density-based Local Outliers(PDF). Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. SIGMOD. pp. 93–104. doi:10.1145/335191.335388. ISBN1-58113-217-4.
- ^Liu, Fei Tony; Ting, Kai Ming; Zhou, Zhi-Hua (December 2008). Isolation Forest. 2008 Eighth IEEE International Conference on Data Mining. pp. 413–422. doi:10.1109/ICDM.2008.17. ISBN9780769535029.
- ^Schubert, E.; Zimek, A.; Kriegel, H. -P. (2012). 'Local outlier detection reconsidered: A generalized view on locality with applications to spatial, video, and network outlier detection'. Data Mining and Knowledge Discovery. 28: 190–237. doi:10.1007/s10618-012-0300-z.
- ^Kriegel, H. P.; Kröger, P.; Schubert, E.; Zimek, A. (2009). Outlier Detection in Axis-Parallel Subspaces of High Dimensional Data. Advances in Knowledge Discovery and Data Mining. Lecture Notes in Computer Science. 5476. p. 831. doi:10.1007/978-3-642-01307-2_86. ISBN978-3-642-01306-5.
- ^Kriegel, H. P.; Kroger, P.; Schubert, E.; Zimek, A. (2012). Outlier Detection in Arbitrarily Oriented Subspaces. 2012 IEEE 12th International Conference on Data Mining. p. 379. doi:10.1109/ICDM.2012.21. ISBN978-1-4673-4649-8.
- ^Fanaee-T, H.; Gama, J. (2016). 'Tensor-based anomaly detection: An interdisciplinary survey'. Knowledge-Based Systems. 98: 130–147. doi:10.1016/j.knosys.2016.01.027.
- ^Zimek, A.; Schubert, E.; Kriegel, H.-P. (2012). 'A survey on unsupervised outlier detection in high-dimensional numerical data'. Statistical Analysis and Data Mining. 5 (5): 363–387. doi:10.1002/sam.11161.
- ^Schölkopf, B.; Platt, J. C.; Shawe-Taylor, J.; Smola, A. J.; Williamson, R. C. (2001). 'Estimating the Support of a High-Dimensional Distribution'. Neural Computation. 13 (7): 1443–71. CiteSeerX10.1.1.4.4106. doi:10.1162/089976601750264965. PMID11440593.
- ^ abcHawkins, Simon; He, Hongxing; Williams, Graham; Baxter, Rohan (2002). 'Outlier Detection Using Replicator Neural Networks'. Data Warehousing and Knowledge Discovery. Lecture Notes in Computer Science. 2454. pp. 170–180. CiteSeerX10.1.1.12.3366. doi:10.1007/3-540-46145-0_17. ISBN978-3-540-44123-6.
- ^He, Z.; Xu, X.; Deng, S. (2003). 'Discovering cluster-based local outliers'. Pattern Recognition Letters. 24 (9–10): 1641–1650. CiteSeerX10.1.1.20.4242. doi:10.1016/S0167-8655(03)00003-5.
- ^Campello, R. J. G. B.; Moulavi, D.; Zimek, A.; Sander, J. (2015). 'Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection'. ACM Transactions on Knowledge Discovery from Data. 10 (1): 5:1–51. doi:10.1145/2733381.
- ^Lazarevic, A.; Kumar, V. (2005). Feature bagging for outlier detection. Proc. 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. pp. 157–166. CiteSeerX10.1.1.399.425. doi:10.1145/1081870.1081891. ISBN978-1-59593-135-1.
- ^Nguyen, H. V.; Ang, H. H.; Gopalkrishnan, V. (2010). Mining Outliers with Ensemble of Heterogeneous Detectors on Random Subspaces. Database Systems for Advanced Applications. Lecture Notes in Computer Science. 5981. p. 368. doi:10.1007/978-3-642-12026-8_29. ISBN978-3-642-12025-1.
- ^Kriegel, H. P.; Kröger, P.; Schubert, E.; Zimek, A. (2011). Interpreting and Unifying Outlier Scores. Proceedings of the 2011 SIAM International Conference on Data Mining. pp. 13–24. CiteSeerX10.1.1.232.2719. doi:10.1137/1.9781611972818.2. ISBN978-0-89871-992-5.
- ^Schubert, E.; Wojdanowski, R.; Zimek, A.; Kriegel, H. P. (2012). On Evaluation of Outlier Rankings and Outlier Scores. Proceedings of the 2012 SIAM International Conference on Data Mining. pp. 1047–1058. doi:10.1137/1.9781611972825.90. ISBN978-1-61197-232-0.
- ^Zimek, A.; Campello, R. J. G. B.; Sander, J. R. (2014). 'Ensembles for unsupervised outlier detection'. ACM SIGKDD Explorations Newsletter. 15: 11–22. doi:10.1145/2594473.2594476.
- ^Zimek, A.; Campello, R. J. G. B.; Sander, J. R. (2014). Data perturbation for outlier detection ensembles. Proceedings of the 26th International Conference on Scientific and Statistical Database Management – SSDBM '14. p. 1. doi:10.1145/2618243.2618257. ISBN978-1-4503-2722-0.
- ^Campos, Guilherme O.; Zimek, Arthur; Sander, Jörg; Campello, Ricardo J. G. B.; Micenková, Barbora; Schubert, Erich; Assent, Ira; Houle, Michael E. (2016). 'On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study'. Data Mining and Knowledge Discovery. 30 (4): 891. doi:10.1007/s10618-015-0444-8. ISSN1384-5810.
- ^Anomaly detection benchmark data repository of the Ludwig-Maximilians-Universität München; Mirror at University of São Paulo.
- ^Denning, D. E. (1987). 'An Intrusion-Detection Model'(PDF). IEEE Transactions on Software Engineering. SE-13 (2): 222–232. CiteSeerX10.1.1.102.5127. doi:10.1109/TSE.1987.232894.
- ^Teng, H. S.; Chen, K.; Lu, S. C. (1990). Adaptive real-time anomaly detection using inductively generated sequential patterns(PDF). Proceedings of the IEEE Computer Society Symposium on Research in Security and Privacy. pp. 278–284. doi:10.1109/RISP.1990.63857. ISBN978-0-8186-2060-7.
- ^Jones, Anita K.; Sielken, Robert S. (1999). 'Computer System Intrusion Detection: A Survey'. Technical Report, Department of Computer Science, University of Virginia, Charlottesville, VA. CiteSeerX10.1.1.24.7802.
Retrieved from 'https://en.wikipedia.org/w/index.php?title=Anomaly_detection&oldid=884390777'