CYBERCRIME — CONFUSION MATRIX

Saranya. S
4 min readJun 5, 2021

--

CYBERCRIME

  • Computer crime, or Cybercrime, refers to any crime that involves a computer and a network. Net crime is criminal exploitation of the Internet
  • A cyber-attack is an exploitation of computer systems and networks. It uses malicious code to alter computer code, logic or data and lead to cybercrimes, such as information and identity theft.
  • Intrusion detection systems (IDS) which monitor and identify malicious behaviour on network traffic have been extensively researched and used in traditional IT infrastructures.
  • Such tools play a key role in the understanding the cyber-attack that has occurred and can aid a faster and more efficient incident response rate.

CONFUSION MATRIX

  • A confusion matrix is a performance measurement technique for Machine learning classification problems.
  • It’s a simple table which helps us to know the performance of the classification model on test data for the true values are known.
  • A confusion matrix contains information about actual and predicted classifications done by a classification system.
  • Performance of such systems is commonly evaluated using the data in the matrix.
  • A much better way to evaluate the performance of a classifier is to look at the confusion matrix.
  • Confusion matrix is also known as “error-matrix”.
  • The following table shows the confusion matrix for a two class classifier.

● TP is the number of correct predictions that an instance is positive

● FN is the number of incorrect predictions that an instance is negative

● FP is the number of incorrect predictions that an instance positive

● TN is the number of correct predictions that an instance is negative

  • Several standard terms have been defined for the 2 class matrix:

● The accuracy (AC) is the proportion of the total number of predictions that were correct. It is determined using the equation:

                                TP + TN 
AC = -----------------
TP + TN + FP + FN

● The recall or true positive rate (TPR) is the proportion of positive cases that were correctly identified (i.e., Sensitivity or Recall), as calculated using the equation:

                                  TP
TPR = ---------
FN + TP

● The false positive rate (FPR) is the proportion of negatives cases that were incorrectly classified as positive, as calculated using the equation:

                                  FP
FPR = ---------
TN + FP

● The true negative rate (TNR) is defined as the proportion of negatives cases that were classified correctly (i.e., Specificity), as calculated using the equation:

                                  TN
TNR = ---------
TN + FP

● The false negative rate (FNR) is the proportion of positives cases that were incorrectly classified as negative, as calculated using the equation:

                                  FN
FNR = ---------
FN + TP

● The Negative predictive value (NPV) predicts the value for both true negatives and false negatives, as calculated using the equation:

                                  TN
NPV = ---------
TN + FN

● The Positive predictive value (PPV) predicts the value for both true positives and false positives (i.e., precision), as calculated using the equation:

                                  TP
PPV = ---------
TP + FP
  • Types of error in confusion matrix are :

● Type 1 error (FP):

* We predicted yes, but they are not leaving the network (not churn) i.e., we are wrongly predicted a negative as positive. It is known as a “Type 1 error”.* In case of cyber attacks, it predicts that attacks are not happening but in real it happens which leads to a vicious one.* so, that's why type 1 error is the most dangerous.

● Type 2 error (FN):

* We predicted no, but they are actually leaving the network (churn) i.e., we are wrongly predicted a positive as negative. It is known as a “Type 2 error” or “False Alarm”* In case of cyber attacks, it predicts that attacks are happening but in real it does not occur.

CONCLUSION :

|| As a conclusion , Confusion Matrix is widely used in classification models. It is a matrix used to determine the performance of the classification models for a given set of test data. It can only be determined if the true values for test data are known. It is in the form of a square matrix where the column represents the actual values and the row depicts the predicted value of the model and vice versa. Type I and type II errors present unique problems in case of cyber attacks. Unfortunately, type I error is the most dangerous one.||

--

--

No responses yet