Machine Learning - (LAB PROGRAMS)

Week 10

☛ Mini Project : Performance Analysis of Classification Algorithms on the Iris Dataset.

Title:

Performance Analysis of Classification Algorithms on the Iris Dataset

Objective:

To evaluate and compare the performance of various classification algorithms such as:

Logistic Regression
Decision Tree
K-Nearest Neighbors (KNN)
Support Vector Machine (SVM)
Random Forest

using performance metrics like accuracy, confusion matrix, precision, recall, and F1-score.

Dataset:

Dataset Name: Iris Dataset
Source: sklearn.datasets.load_iris() or UCI Machine Learning Repository
Features: Sepal Length, Sepal Width, Petal Length, Petal Width
Target: Iris species (setosa, versicolor, virginica)

Tools & Libraries:

import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

Step-by-Step Implementation:

1. Load and Prepare Data

iris = load_iris()
X = iris.data
y = iris.target

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Feature scaling
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

2. Initialize Models

models = {
    "Logistic Regression": LogisticRegression(),
    "Decision Tree": DecisionTreeClassifier(),
    "K-Nearest Neighbors": KNeighborsClassifier(),
    "Support Vector Machine": SVC(),
    "Random Forest": RandomForestClassifier()
}

3. Train, Predict, and Evaluate

for name, model in models.items():
    print(f"\nModel: {name}")
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)

    print("Accuracy:", accuracy_score(y_test, y_pred))
    print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
    print("Classification Report:\n", classification_report(y_test, y_pred))

Sample Output (for SVM):

Model: Support Vector Machine
Accuracy: 1.0
Confusion Matrix:
[[16  0  0]
 [ 0 14  0]
 [ 0  0 15]]
Classification Report:
              precision    recall  f1-score   support
           0       1.00      1.00      1.00        16
           1       1.00      1.00      1.00        14
           2       1.00      1.00      1.00        15
    accuracy                           1.00        45
   macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45

Metrics Used for Comparison:

Accuracy
Confusion Matrix
Precision
Recall
F1-score

Conclusion:

Based on the evaluation metrics, all models may perform well on this simple dataset, but on real-world data, metrics will vary. Random Forest and SVM generally provide better performance in complex datasets.

Optional Extensions:

Use different datasets: Breast Cancer, Wine, Titanic (Kaggle)
Add cross-validation
Use AUC-ROC curves for binary classification
Plot confusion matrix using seaborn heatmap

Complete Source Code:

File Name: mini_project.py


import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

# Load dataset
iris = load_iris()
X = iris.data
y = iris.target

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Feature scaling
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Define models
models = {
    "Logistic Regression": LogisticRegression(),
    "Decision Tree": DecisionTreeClassifier(),
    "K-Nearest Neighbors": KNeighborsClassifier(),
    "Support Vector Machine": SVC(),
    "Random Forest": RandomForestClassifier()
}

# Train, Predict, and Evaluate
for name, model in models.items():
    print(f"\nModel: {name}")
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)

    print("Accuracy:", accuracy_score(y_test, y_pred))
    print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
    print("Classification Report:\n", classification_report(y_test, y_pred))

Output:


Sample Run:
--------------
$ python3 mini_project.py
Model: Logistic Regression
Accuracy: 1.0
Confusion Matrix:
 [[19  0  0]
 [ 0 13  0]
 [ 0  0 13]]
Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       1.00      1.00      1.00        13
           2       1.00      1.00      1.00        13

    accuracy                           1.00        45
   macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45


Model: Decision Tree
Accuracy: 1.0
Confusion Matrix:
 [[19  0  0]
 [ 0 13  0]
 [ 0  0 13]]
Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       1.00      1.00      1.00        13
           2       1.00      1.00      1.00        13

    accuracy                           1.00        45
   macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45


Model: K-Nearest Neighbors
Accuracy: 1.0
Confusion Matrix:
 [[19  0  0]
 [ 0 13  0]
 [ 0  0 13]]
Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       1.00      1.00      1.00        13
           2       1.00      1.00      1.00        13

    accuracy                           1.00        45
   macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45


Model: Support Vector Machine
Accuracy: 1.0
Confusion Matrix:
 [[19  0  0]
 [ 0 13  0]
 [ 0  0 13]]
Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       1.00      1.00      1.00        13
           2       1.00      1.00      1.00        13

    accuracy                           1.00        45
   macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45


Model: Random Forest
Accuracy: 1.0
Confusion Matrix:
 [[19  0  0]
 [ 0 13  0]
 [ 0  0 13]]
Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       1.00      1.00      1.00        13
           2       1.00      1.00      1.00        13

    accuracy                           1.00        45
   macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45

Machine Learning Lab Programs

1) Write a python program to compute
• Central Tendency Measures: Mean, Median,Mode
• Measure of Dispersion: Variance, Standard Deviation View Solution

2) Study of Python Basic Libraries such as Statistics, Math, Numpy and Scipy View Solution

3) Study of Python Libraries for ML application such as Pandas and Matplotlib View Solution

4) Write a Python program to implement Simple Linear Regression View Solution

5) Implementation of Multiple Linear Regression for House Price Prediction using sklearn View Solution

6) Implementation of Decision tree using sklearn and its parameter tuning View Solution

7) Implementation of KNN using sklearn View Solution

8) Implementation of Logistic Regression using sklearn View Solution

9) Implementation of K-Means Clustering View Solution

10) Performance analysis of Classification Algorithms on a specific dataset (Mini Project) View Solution

Menu