To install required library files, Open Command Prompt or Terminal and execute the following commands
$ pip install scikit-learn
$ pip install numpy
$ pip install matplotlib
$ pip install CHAID
import numpy as np
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
from CHAID import Tree
# Generate sample dataset
X, y = make_classification(
n_samples=150,
n_features=2,
n_redundant=0,
n_informative=2,
n_clusters_per_class=1,
random_state=42
)
# Convert numerical features to categorical values
df = pd.DataFrame(X, columns=['Feature1', 'Feature2'])
# Binning continuous features into categories
df['Feature1'] = pd.cut(df['Feature1'], bins=3, labels=[0, 1, 2])
df['Feature2'] = pd.cut(df['Feature2'], bins=3, labels=[0, 1, 2])
df['Target'] = y
# Split dataset
train_df, test_df = train_test_split(
df, test_size=0.2, random_state=42
)
# Build CHAID tree
tree = Tree.from_pandas_df(
train_df,
dict(Feature1='nominal',
Feature2='nominal'),
'Target'
)
# Display tree structure
print("\nCHAID Tree:")
tree.print_tree()
# Simple prediction function
def predict_sample(row):
# Example rule-based prediction from root split
# Modify according to generated tree
if row['Feature1'] == 0:
return 0
else:
return 1
# Predictions
y_test = test_df['Target']
y_pred = test_df.apply(predict_sample, axis=1)
# Evaluation
print("\nClassification Report:")
print(classification_report(y_test, y_pred))
print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_pred))
CHAID Tree:
([], {0: 62.0, 1: 58.0}, (Feature1, p=6.721448536214306e-18, score=79.08245597320469, groups=[[0], [1], [2]]), dof=2))
|-- ([0], {0: 31.0, 1: 0}, - the node only contains single category respondents)
|-- ([1], {0: 25.0, 1: 6.0}, - splitting would create nodes with less than the minimum child node size)
+-- ([2], {0: 6.0, 1: 52.0}, - p-value greater than alpha merge)
Classification Report:
precision recall f1-score support
0 1.00 0.54 0.70 13
1 0.74 1.00 0.85 17
accuracy 0.80 30
macro avg 0.87 0.77 0.77 30
weighted avg 0.85 0.80 0.78 30
Confusion Matrix:
[[ 7 6]
[ 0 17]]
1. Simple Linear regression. View Solution
2. Multiple Linear regression. View Solution
3. Logistic Regression. View Solution
4. CHAID. View Solution
5. CART. View Solution
6. ARIMA - stock market data. View Solution
7. Exponential Smoothing. View Solution
8. Hierarchical clustering. View Solution
9. Ward's method of clustering. View Solution
10. Crowdsource predictive analytics- Netflix data. View Solution