Machine Learning (ML) enables computers to learn from data and make predictions without being explicitly programmed.
Python is widely used in ML because of libraries like:
- Scikit-learn
- NumPy
- Pandas
- Matplotlib
What is Scikit-learn?
Scikit-learn is a popular Python library for machine learning.
Used for:
- Regression
- Classification
- Clustering
- Model evaluation
Installing Scikit-learn
pip install scikit-learn
Importing Scikit-learn
from sklearn import datasets
Machine Learning Workflow
Typical ML process:
1. Collect Data
2. Prepare Data
3. Train Model
4. Test Model
5. Evaluate Results
Types of Machine Learning
| Type | Description |
|---|---|
| Supervised Learning | Uses labeled data |
| Unsupervised Learning | Uses unlabeled data |
| Reinforcement Learning | Learns using rewards |
Regression
Regression predicts continuous numerical values.
Examples:
- House prices
- Temperature
- Sales prediction
Linear Regression
Linear regression finds relationship between variables.
Example: Linear Regression
from sklearn.linear_model import LinearRegression
import numpy as np
# Input data
X = np.array([[1], [2], [3], [4], [5]])
# Output data
y = np.array([2, 4, 6, 8, 10])
# Create model
model = LinearRegression()
# Train model
model.fit(X, y)
# Predict
prediction = model.predict([[6]])
print(prediction)
Output Example:
[12.]
Regression Terms
| Term | Meaning |
|---|---|
| Features | Input variables |
| Target | Output variable |
| Training Data | Data used for learning |
| Prediction | Model output |
Train-Test Split
Used to separate training and testing data.
from sklearn.model_selection import train_test_split
import numpy as np
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 4, 6, 8, 10])
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2
)
print(X_train)
print(X_test)
Classification
Classification predicts categories or labels.
Examples:
- Spam detection
- Disease prediction
- Image recognition
Logistic Regression Example
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
# Load dataset
iris = load_iris()
X = iris.data
y = iris.target
# Create model
model = LogisticRegression(max_iter=200)
# Train model
model.fit(X, y)
# Predict
prediction = model.predict([X[0]])
print(prediction)
Decision Tree Classifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
y = iris.target
model = DecisionTreeClassifier()
model.fit(X, y)
print(model.predict([X[1]]))
Clustering
Clustering groups similar data points.
It is an:
Unsupervised Learning
technique.
Examples:
- Customer segmentation
- Product grouping
K-Means Clustering
from sklearn.cluster import KMeans
import numpy as np
X = np.array([
[1, 2],
[1, 4],
[5, 8],
[8, 8]
])
model = KMeans(
n_clusters=2,
random_state=0
)
model.fit(X)
print(model.labels_)
Output Example:
[1 1 0 0]
Model Evaluation
Model evaluation checks model performance.
Regression Evaluation
Mean Absolute Error (MAE)
from sklearn.metrics import mean_absolute_error
actual = [10, 20, 30]
predicted = [12, 18, 29]
mae = mean_absolute_error(
actual,
predicted
)
print(mae)
Mean Squared Error (MSE)
from sklearn.metrics import mean_squared_error
mse = mean_squared_error(
actual,
predicted
)
print(mse)
R² Score
from sklearn.metrics import r2_score
score = r2_score(
actual,
predicted
)
print(score)
Classification Evaluation
Accuracy Score
from sklearn.metrics import accuracy_score
actual = [1, 0, 1, 1]
predicted = [1, 0, 1, 0]
accuracy = accuracy_score(
actual,
predicted
)
print(accuracy)
Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(
actual,
predicted
)
print(cm)
Classification Report
from sklearn.metrics import classification_report
print(classification_report(
actual,
predicted
))
Cross Validation
Improves model reliability.
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LinearRegression
import numpy as np
X = np.array([[1], [2], [3], [4]])
y = np.array([2, 4, 6, 8])
model = LinearRegression()
scores = cross_val_score(
model,
X,
y,
cv=2
)
print(scores)
Saving and Loading Models
Use:
joblib
Save Model
import joblib
joblib.dump(model, "model.pkl")
Load Model
model = joblib.load("model.pkl")
Practical Example
House Price Prediction
from sklearn.linear_model import LinearRegression
import numpy as np
# Area of houses
X = np.array([
[500],
[1000],
[1500],
[2000]
])
# Prices
y = np.array([
100000,
200000,
300000,
400000
])
model = LinearRegression()
model.fit(X, y)
price = model.predict([[2500]])
print("Predicted Price:", price[0])
Advantages of Scikit-learn
✅ Simple and easy API
✅ Fast machine learning models
✅ Good documentation
✅ Supports many algorithms
✅ Works well with NumPy and Pandas
Summary
In this chapter you learned:
✅ Scikit-learn basics
✅ Regression
✅ Classification
✅ Clustering
✅ Model evaluation






