|
1 |
| -# Python-BasicOperations-and-DataAnalysis |
| 1 | +# Python-BasicOperations-and-DataAnalysis |
| 2 | + |
| 3 | +##This repository contains two Notebooks |
| 4 | + |
| 5 | +# Notebook 1: "Python-BasicOperations.ipynb"- Contains Basic Operations with Python Dataframes. This notebook contains the below topics |
| 6 | + |
| 7 | +## Topic 1: Basic Dataframe Reading/Operations |
| 8 | +###### Code Block 1.1: Reading the dataframe |
| 9 | +###### Code Block 1.2: Getting to know the shape of the dataset (Rows and Columns) |
| 10 | +###### Code Block 1.3: Length of dataframe. |
| 11 | +###### Code Block 1.4: Getting to know the data type of the dataset |
| 12 | +###### Code Block 1.5: Extracting one column from the dataframe and getting to know the data type and the size (of Series) |
| 13 | +###### Code Block 1.6: Printing the size of the series |
| 14 | +###### Code Block 1.7: Data types for whole dataframe (variables) |
| 15 | +###### Code Block 1.8: Working on creating specific indexes |
| 16 | +###### Code Block 1.9: Printing the first 5 rows of the dataframe |
| 17 | +###### Code Block 1.10: Printing the last 5 rows of the dataframe |
| 18 | +###### Code Block 1.11: Displaying the information of the dataframe |
| 19 | +###### Code Block 1.12: Extracting all rows from the dataframe with only one column |
| 20 | +###### Code Block 1.13: Understanding difference between Series and Dataframe |
| 21 | +###### Code Block 1.14: Extracting range of columns. For example all columns from country to right end |
| 22 | +###### Code Block 1.15: Selection Based on single index column |
| 23 | +###### Code Block 1.16: Selection Based on multiple index columns values |
| 24 | +###### Code Block 1.17: Selection Based on multiple index columns values |
| 25 | + |
| 26 | +## Topic 2: Conversion of operations/code from SQL to Python |
| 27 | +###### Code Block 2.1: SQL (where clause with single condition)-->Python Code |
| 28 | +###### Code Block 2.2: SQL (where clause with multiple conditions)-->Python Code |
| 29 | +###### Code Block 2.3: SQL (where clause with multiple conditions using NOT IN)-->Python Code |
| 30 | +###### Code Block 2.4: SQL (where clause with order by on single variable)-->Python Code |
| 31 | +###### Code Block 2.5: SQL (where clause with order by on multiple variables)-->Python Code |
| 32 | + |
| 33 | +## Topic 3: Data Exploration |
| 34 | +###### Code Block 3.1:Checking on various statistics for categorical variables with 1 variable (Series) |
| 35 | +###### Code Block 3.2:Checking on various statistics for integer variables with 1 variable (Series) |
| 36 | +###### Code Block 3.3:Working on describe method for the whole dataframe which basically consists a mix of numbers and categorical variables |
| 37 | +###### Code Block 3.4: Data exploration methods for Series vs Dataframe |
| 38 | +###### Code Block 3.5:Checking median of all integer columns in the dataframe |
| 39 | +###### Code Block 3.6: Select distinct values of any column |
| 40 | + |
| 41 | + |
| 42 | +## Topic 4: Creating a new column, calculated columns, cleaning the column names, dropping rows/columns |
| 43 | +###### 4.1 Creating a new column with in dataframe |
| 44 | +###### 4.2 Printing all the columns from the dataset/Getting to know the column names |
| 45 | +###### 4.3 Cleaning the columns |
| 46 | +###### 4.4 Converting the data type of the column |
| 47 | +###### 4.5 Renaming the column names |
| 48 | +###### 4.6 Counting the missing values for each column in the dataset |
| 49 | +###### 4.7 Dropping rows and columns that has missing values |
| 50 | +###### 4.8 Replacing the missing values with some value (provided with some conditional logic) |
| 51 | +###### 4.9 Map funciton |
| 52 | +###### 4.10 Writing the final dataset (cleaned) one into csv |
| 53 | + |
| 54 | +## Topic 5: Plotting |
| 55 | +###### 5.1 Plotting a horizontal bar/histogram |
| 56 | + |
| 57 | +# Notebook 2: "Python-Data Combining.ipynb"- Contains various techniques on combining/merging the dataframes. This notebook contains the below topics |
| 58 | + |
| 59 | + |
| 60 | +# Topic 1: Data Combine |
| 61 | +###### 1.1 Combining dataframes using concat function |
| 62 | +###### 1.2 Combining dataframes using concat function- with Ignore Index option |
| 63 | +###### 1.4 Combining dataframes using Merge function (Inner Join) |
| 64 | +###### 1.5 Combining dataframes using Merge function (left Join) |
| 65 | +###### 1.6 Combining dataframes using Merge function (right Join) |
| 66 | +###### 1.7 Combining dataframes using Merge function (outer Join) |
| 67 | +###### 1.8 Use of suffixes |
| 68 | + |
| 69 | +# Topic 2: Transforming Data with Pandas- Using map(), apply(), applymap(), apply(), melt() |
| 70 | +###### 2.1 Creation of new column based on cases (this is more like CASE statement in SAS/SQL) |
| 71 | +###### 2.2 Difference between apply() and map() |
| 72 | +###### 2.3 Use of applymap() |
| 73 | +###### 2.4 Using pd.melt(): unpivots a Dataframe from wide format to long format |
| 74 | + |
| 75 | +# Topic 3: Working with Strings in Pandas |
| 76 | +###### 3.1 Renaming one of the column |
| 77 | +###### 3.2 Commonly used String functions |
| 78 | +###### 3.3 Calculating the lenth of a string for one column and store it in a different column |
| 79 | +###### 3.4 Creating a new calculated column by converting the string stored in one column to upper case letters |
| 80 | +###### 3.5 Pattern Searching- Using contains |
| 81 | +###### 3.6 extract() and extractall() |
| 82 | + |
| 83 | +# Topic 4: Working with Missing and Duplicate Data |
| 84 | +###### 4.1 Identifying missing values |
| 85 | +###### 4.2 Indentifying the duplicate values |
| 86 | +###### 4.3 Dropping the duplicates |
| 87 | +###### 4.4 Imputation of missing values with mean or any fixed value- Using fillna() |
| 88 | +###### 4.5 Dropping rows/columns which contains missing values |
0 commit comments