Skip to content

Commit 7305860

Browse files
committed
Add New snippets Codes ...
1 parent 26943ec commit 7305860

File tree

1 file changed

+89
-0
lines changed

1 file changed

+89
-0
lines changed

MostCommonDataScienceGist.py

+89
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
# 10 common Python code snippets used in data science
2+
# These code snippets cover various aspects of data science, including data loading, exploration, visualization, cleaning, machine learning modeling, evaluation, and feature selection. You can adapt and expand upon these snippets to suit your specific data science projects.
3+
# 1. Importing Libraries:
4+
5+
# Import commonly used data science libraries
6+
import numpy as np
7+
import pandas as pd
8+
import matplotlib.pyplot as plt
9+
import seaborn as sns
10+
11+
# Description: Import essential data science libraries like NumPy, Pandas, Matplotlib, and Seaborn.
12+
# -------------------------------------------------------------------------------------------------------------------------
13+
# 2. Loading Data:
14+
15+
# Load a CSV file into a Pandas DataFrame
16+
data = pd.read_csv('data.csv')
17+
18+
# Description: Load data from a CSV file into a Pandas DataFrame for analysis.
19+
# -------------------------------------------------------------------------------------------------------------------------
20+
#3. Data Exploration:
21+
22+
# Display basic statistics of the DataFrame
23+
print(data.describe())
24+
25+
# Description: Get a summary of the data, including mean, standard deviation, and other statistical measures.
26+
# -------------------------------------------------------------------------------------------------------------------------
27+
# 4. Data Visualization:
28+
29+
# Create a histogram to visualize a numeric variable
30+
plt.hist(data['column_name'], bins=20)
31+
plt.xlabel('X-axis Label')
32+
plt.ylabel('Y-axis Label')
33+
plt.title('Histogram of Column')
34+
plt.show()
35+
36+
# Description: Create a histogram to visualize the distribution of a numeric variable.
37+
# -------------------------------------------------------------------------------------------------------------------------
38+
# 5. Data Cleaning:
39+
40+
# Check for and handle missing values
41+
data.isnull().sum()
42+
data.fillna(value, inplace=True)
43+
44+
# Description: Identify and handle missing values in the dataset.
45+
# -------------------------------------------------------------------------------------------------------------------------
46+
# 6. Data Preprocessing:
47+
48+
# Encode categorical variables using one-hot encoding
49+
data = pd.get_dummies(data, columns=['categorical_column'])
50+
51+
# Description: Convert categorical variables into numerical format using one-hot encoding.
52+
# -------------------------------------------------------------------------------------------------------------------------
53+
# 7. Machine Learning Model:
54+
55+
# Train a machine learning model (e.g., Linear Regression)
56+
from sklearn.linear_model import LinearRegression
57+
model = LinearRegression()
58+
model.fit(X_train, y_train)
59+
60+
# Description: Train a machine learning model, such as linear regression, on a training dataset.
61+
# -------------------------------------------------------------------------------------------------------------------------
62+
# 8. Model Evaluation:
63+
64+
# Evaluate the model's performance using metrics like RMSE
65+
from sklearn.metrics import mean_squared_error
66+
y_pred = model.predict(X_test)
67+
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
68+
print('Root Mean Squared Error:', rmse)
69+
70+
# Description: Assess the model's performance using evaluation metrics like RMSE.
71+
# -------------------------------------------------------------------------------------------------------------------------
72+
# 9. Data Splitting:
73+
74+
# Split the dataset into training and testing sets
75+
from sklearn.model_selection import train_test_split
76+
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
77+
78+
# Description: Divide the data into training and testing subsets for model validation.
79+
# -------------------------------------------------------------------------------------------------------------------------
80+
# 10. Feature Selection:
81+
82+
# Select important features using feature importance
83+
from sklearn.ensemble import RandomForestClassifier
84+
model = RandomForestClassifier()
85+
model.fit(X, y)
86+
feature_importance = model.feature_importances_
87+
88+
# Description: Determine important features in a dataset using techniques like feature importance.
89+
# -------------------------------------------------------------------------------------------------------------------------

0 commit comments

Comments
 (0)