Cyber Security & the Confusion Matrix
Today we will learn how is Confusion Matrix, and how Machine Learning with the helping the industry with this concept to reduce Cybercrimes. Machine Learning is very much an important part of the IT industry and it has been used in every domain and it is being developed day by day to meet the need of the industry, and one of that prominent domains is Cybersecurity. Let’s see how? But first, let’s understand some basic terminologies.
Some facts & data related to Cybercrime & Cybersecurity:
Since COVID-19, the US FBI reported a 300% increase in reported cybercrimes.
9.7 Million Records healthcare records were compromised in September 2020 alone.
Approximately $6 trillion is expected to be spent globally on cybersecurity by 2021
64% of companies have experienced web-based attacks. 62% experienced phishing & social engineering attacks. 59% of companies experienced malicious code and botnets and 51% experienced denial of service attacks.
43% of cyber attacks target small business (those with fewer than 500 employees). Small organizations spend an average of $7.68 million per incident.
What is Cybercrime?
Cybercrime, which is also known as computer crime, the use of a computer as an instrument to further illegal ends, such as committing fraud, trafficking in child pornography and intellectual property, stealing identities, or violating privacy. Cybercrime, especially through the Internet, has grown in importance as the computer has become central to commerce, entertainment, and government.
Some common types of Cyber Attacks:
- Denial-of-service (DoS) and distributed denial-of-service (DDoS) attacks
- Zero-Day Exploit
- SQL Injection
- Man in the Middle (MitM) attack
- Phishing & spear phishing attacks
- Drive-by attack
- Cross-site scripting
- Eavesdropping attack
- Business Email compromise
What is Cyber Security?
Cybersecurity is the practice of defending computers, servers, mobile devices, electronic systems, networks, and data from malicious attacks. It’s also known as information technology security or electronic information security.
Cybersecurity is important because the government, military, corporate, financial, and medical organizations collect, process, and store unprecedented amounts of data on computers and other devices. A significant portion of that data can be sensitive information, whether that be intellectual property, financial data, personal information, or other types of data for which unauthorized access or exposure could have negative consequences.
What is Confusion Matrix?
The confusion matrix visualizes the accuracy of a classifier by comparing the actual and predicted classes.
A confusion matrix is a table that is used to determine the performance of a classification model. We compare the predicted values for test data with the true values known to us. Using this we get to know how many predictions were right & wrong, accuracy, precision, recall, sensitivity, and some more things can be obtained for further analysis. We will discuss these terms further.
Let’s understand Confusion Matrix with a Cyberattack example:
Intrusion Detection System (IDS) checks for any malicious activity on the system. It monitors the packets coming over the internet using some ML model and predicts whether it is normal or an anomaly.
Let’s say there are many hits on some servers, and then if there will be any malicious packet that means some is attacking, therefore the Cybersecurity team needs to be alert. Therefore for this, a Machine Learning model is created which examines all the network packets and predicts whether any attack happened on the system or not.
For sake of understanding, let’s take a hypothetical example. There are 165 packets that have hit our servers, therefore our ML model starts predicting whether there are any attacks or not. Then Model provides us the analysis in the form of Confusion Matrix, let’s understand what it means:
There were 165 total network packets examined by the ML model in the IDS system, which is been classified in the form of Confusion Matrix as displayed above.
- True Positive: The model predicted 50 packets are safe, and they were actually safe. This was a right prediction as well as positive news for us.
- True Negative: The model predicted 100 packets are malicious, and it was absolutely right. This was a right prediction but negative for us. Well, the security team got to know about the threat on time, which is a great thing.
- False Negative: The prediction was that 5 packets are threatful but actually they were safe. This was a wrong prediction, well the security team has to waste unnecessary time but anyway there was no threat to the servers. This is also known as Type 2 Error.
- False Positive: The model predicted that 10 packets are safe, but actually they were not. This is the most dangerous prediction, as the security team got no alert, but the server was in threat. This is also known as Type 1 Error.
It is important to create the ML model & train it in such a way that the Type 1 and 2 errors should reduce to a minimum, then only full-proof Cybersecurity can be achieved. This is the overview of how Cybersecurity is maintained using machine learning in the real world.
What Confusion Matrix provides us:
We can obtain various things like accuracy, precision, recall, and many more significant values for further analysis through the Confusion matrix.
- Precision: Precision is used to calculate the model’s ability to classify positive values correctly. It is the true positives divided by the total number of predicted positive values.
- Accuracy: Accuracy is used to find the portion of correctly classified values. It tells us how often our classifier is right. It is the sum of all true values divided by total values.
- Sensitivity: It is used to calculate the model’s ability to predict positive values. It is the true positives divided by the total number of actual positive values.
- Misclassification: It is the inability of the system to provide the right predictions. It can be calculated as 1 minus the accuracy OR sum of all the false values divided by the total values.
Similarly, other quantities as specified in tabular format in the figure shown above.
Machine Learning use cases are increasing day by day in all the domains possible, and Cybersecurity is one such domain that is getting benefitted by the advancements in ML. Never before in the past, the detection of threats of cyber attacks was so fast & easy to detect. With more and more enhancement in technologies, Cybercrime is also increasing many folds. Hence, it is the need of the hour to safeguard our community through better & faster threat detection systems.
Thankyou for reading, hope you got to learn something from this!