Scikit-Learn is a Python library that makes machine learning accessible to beginners. This article walks you through installing it and building your first model.
What is Scikit-Learn?
Scikit-Learn is an open-source library built on NumPy, SciPy, and Matplotlib, offering simple tools for data analysis and modeling. It’s perfect for tasks like classification, regression, and clustering.
Installation
Install it via pip:
pip install scikit-learn
Ensure you have Python 3.8+ and NumPy installed.
Your First Model
Let’s build a classifier using the Iris dataset:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
# Load data
iris = load_iris()
X, y = iris.data, iris.target
# Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train model
model = KNeighborsClassifier(n_neighbors=3)
model.fit(X_train, y_train)
# Test accuracy
accuracy = model.score(X_test, y_test)
print(f"Accuracy: {accuracy:.2f}")
This code loads the Iris dataset, trains a k-nearest neighbors classifier, and evaluates it.
Key Features
- Data Preprocessing: Normalize or scale data:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
Conclusion
Scikit-Learn simplifies machine learning with its intuitive API. Start with small datasets like Iris, and you’ll be ready to tackle real-world problems in no time.