Menu

Predictive Analytics - (LAB PROGRAMS)


Aim:

  CHAID (Chi-square Automatic Interaction Detector)

Solution :


Library Installation:

To install required library files, Open Command Prompt or Terminal and execute the following commands


$ pip install scikit-learn

$ pip install numpy

$ pip install matplotlib

$ pip install CHAID

PROGRAM: (CHAID.py)

 
import numpy as np
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
from CHAID import Tree

# Generate sample dataset
X, y = make_classification(
    n_samples=150,
    n_features=2,
    n_redundant=0,
    n_informative=2,
    n_clusters_per_class=1,
    random_state=42
)

# Convert numerical features to categorical values
df = pd.DataFrame(X, columns=['Feature1', 'Feature2'])

# Binning continuous features into categories
df['Feature1'] = pd.cut(df['Feature1'], bins=3, labels=[0, 1, 2])
df['Feature2'] = pd.cut(df['Feature2'], bins=3, labels=[0, 1, 2])

df['Target'] = y

# Split dataset
train_df, test_df = train_test_split(
    df, test_size=0.2, random_state=42
)

# Build CHAID tree
tree = Tree.from_pandas_df(
    train_df,
    dict(Feature1='nominal',
         Feature2='nominal'),
    'Target'
)

# Display tree structure
print("\nCHAID Tree:")
tree.print_tree()

# Simple prediction function
def predict_sample(row):
    # Example rule-based prediction from root split
    # Modify according to generated tree
    if row['Feature1'] == 0:
        return 0
    else:
        return 1

# Predictions
y_test = test_df['Target']
y_pred = test_df.apply(predict_sample, axis=1)

# Evaluation
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_pred))

OUTPUT:

 
CHAID Tree:
([], {0: 62.0, 1: 58.0}, (Feature1, p=6.721448536214306e-18, score=79.08245597320469, groups=[[0], [1], [2]]), dof=2))
|-- ([0], {0: 31.0, 1: 0}, <Invalid Chaid Split> - the node only contains single category respondents)
|-- ([1], {0: 25.0, 1: 6.0}, <Invalid Chaid Split> - splitting would create nodes with less than the minimum child node size)
+-- ([2], {0: 6.0, 1: 52.0}, <Invalid Chaid Split> - p-value greater than alpha merge)


Classification Report:
              precision    recall  f1-score   support

           0       1.00      0.54      0.70        13
           1       0.74      1.00      0.85        17

    accuracy                           0.80        30
   macro avg       0.87      0.77      0.77        30
weighted avg       0.85      0.80      0.78        30


Confusion Matrix:
[[ 7  6]
 [ 0 17]]




Related Content :

1. Simple Linear regression.   View Solution


2. Multiple Linear regression.   View Solution


3. Logistic Regression.   View Solution


4. CHAID.   View Solution


5. CART.   View Solution


6. ARIMA - stock market data.   View Solution


7. Exponential Smoothing.   View Solution


8. Hierarchical clustering.   View Solution


9. Ward's method of clustering.   View Solution


10. Crowdsource predictive analytics- Netflix data.   View Solution