16. Machine Learning with Python

Machine Learning (ML) enables computers to learn from data and make predictions without being explicitly programmed.

Python is widely used in ML because of libraries like:

  • Scikit-learn
  • NumPy
  • Pandas
  • Matplotlib

What is Scikit-learn?

Scikit-learn is a popular Python library for machine learning.

Used for:

  • Regression
  • Classification
  • Clustering
  • Model evaluation

Installing Scikit-learn

pip install scikit-learn

Importing Scikit-learn

from sklearn import datasets

Machine Learning Workflow

Typical ML process:

1. Collect Data
2. Prepare Data
3. Train Model
4. Test Model
5. Evaluate Results

Types of Machine Learning

TypeDescription
Supervised LearningUses labeled data
Unsupervised LearningUses unlabeled data
Reinforcement LearningLearns using rewards

Regression

Regression predicts continuous numerical values.

Examples:

  • House prices
  • Temperature
  • Sales prediction

Linear Regression

Linear regression finds relationship between variables.


Example: Linear Regression

from sklearn.linear_model import LinearRegression
import numpy as np

# Input data
X = np.array([[1], [2], [3], [4], [5]])

# Output data
y = np.array([2, 4, 6, 8, 10])

# Create model
model = LinearRegression()

# Train model
model.fit(X, y)

# Predict
prediction = model.predict([[6]])

print(prediction)

Output Example:

[12.]

Regression Terms

TermMeaning
FeaturesInput variables
TargetOutput variable
Training DataData used for learning
PredictionModel output

Train-Test Split

Used to separate training and testing data.

from sklearn.model_selection import train_test_split
import numpy as np

X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 4, 6, 8, 10])

X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2
)

print(X_train)
print(X_test)

Classification

Classification predicts categories or labels.

Examples:

  • Spam detection
  • Disease prediction
  • Image recognition

Logistic Regression Example

from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris

# Load dataset
iris = load_iris()

X = iris.data
y = iris.target

# Create model
model = LogisticRegression(max_iter=200)

# Train model
model.fit(X, y)

# Predict
prediction = model.predict([X[0]])

print(prediction)

Decision Tree Classifier

from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris

iris = load_iris()

X = iris.data
y = iris.target

model = DecisionTreeClassifier()

model.fit(X, y)

print(model.predict([X[1]]))

Clustering

Clustering groups similar data points.

It is an:

Unsupervised Learning

technique.

Examples:

  • Customer segmentation
  • Product grouping

K-Means Clustering

from sklearn.cluster import KMeans
import numpy as np

X = np.array([
[1, 2],
[1, 4],
[5, 8],
[8, 8]
])

model = KMeans(
n_clusters=2,
random_state=0
)

model.fit(X)

print(model.labels_)

Output Example:

[1 1 0 0]

Model Evaluation

Model evaluation checks model performance.


Regression Evaluation


Mean Absolute Error (MAE)

from sklearn.metrics import mean_absolute_error

actual = [10, 20, 30]
predicted = [12, 18, 29]

mae = mean_absolute_error(
actual,
predicted
)

print(mae)

Mean Squared Error (MSE)

from sklearn.metrics import mean_squared_error

mse = mean_squared_error(
actual,
predicted
)

print(mse)

R² Score

from sklearn.metrics import r2_score

score = r2_score(
actual,
predicted
)

print(score)

Classification Evaluation


Accuracy Score

from sklearn.metrics import accuracy_score

actual = [1, 0, 1, 1]
predicted = [1, 0, 1, 0]

accuracy = accuracy_score(
actual,
predicted
)

print(accuracy)

Confusion Matrix

from sklearn.metrics import confusion_matrix

cm = confusion_matrix(
actual,
predicted
)

print(cm)

Classification Report

from sklearn.metrics import classification_report

print(classification_report(
actual,
predicted
))

Cross Validation

Improves model reliability.

from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LinearRegression
import numpy as np

X = np.array([[1], [2], [3], [4]])
y = np.array([2, 4, 6, 8])

model = LinearRegression()

scores = cross_val_score(
model,
X,
y,
cv=2
)

print(scores)

Saving and Loading Models

Use:

joblib

Save Model

import joblib

joblib.dump(model, "model.pkl")

Load Model

model = joblib.load("model.pkl")

Practical Example

House Price Prediction

from sklearn.linear_model import LinearRegression
import numpy as np

# Area of houses
X = np.array([
[500],
[1000],
[1500],
[2000]
])

# Prices
y = np.array([
100000,
200000,
300000,
400000
])

model = LinearRegression()

model.fit(X, y)

price = model.predict([[2500]])

print("Predicted Price:", price[0])

Advantages of Scikit-learn

✅ Simple and easy API
✅ Fast machine learning models
✅ Good documentation
✅ Supports many algorithms
✅ Works well with NumPy and Pandas


Summary

In this chapter you learned:

✅ Scikit-learn basics
✅ Regression
✅ Classification
✅ Clustering
✅ Model evaluation