Skip to content

Folder #29

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 9 additions & 41 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,41 +1,9 @@
# ML-Logistic-regression-algorithm-challenge


![DSN logo](DSN_logo.png)|DSN Algorithm Challenge|
|---|---|

A lot of data scientists or machine learning enthusiasts do use various machine learning algorithms as a black box without knowing how they work or the mathematics behind it. The purpose of this challenge is to encourage the mathematical understanding of machine learning algorithms, their break and yield point.

In summary, participants are encouraged to understand the fundamental concepts behind machine learning algorithms/models.


The rules and guidelines for this challenge are as follows:

1. Ensure to register at https://bit.ly/dsnmlhack

2. The algorithm challenge is open to all.

3. Participants are expected to design and develop the Logistic Regression algorithm from scratch using Python or R programming.

4. For python developers (numpy is advisable).

5. To push your solution to us, make a [pull request](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-requests) to DSN's GitHub page at https://www.github.com/datasciencenigeria/ML-Logistic-regression-algorithm-challenge. Ensure to add your readme file to understand your code.

6. The top 3 optimized code will be compensated as follows:

- **1st position**: 20GB data plan.
- **2nd position**: 15GB data plan.
- **3rd position**: 10GB data plan.

7. Add your scripts and readme.MD file as a folder saved as your full name (surname_first_middle name) by making a pull request to the repository.

---
For issues on this challenge kindly reach out to the AI+campus/city managers

**Twitter**: [@DataScienceNIG](https://twitter.com/DataScienceNIG), [@elishatofunmi](https://twitter.com/Elishatofunmi), [@o_funminiyi](https://twitter.com/o_funminiyi), [@gbganalyst](https://twitter.com/gbganalyst)

or

**Call**: +2349062000119,+2349080564419.

Good luck!
# olawale_abdulrasheed_olamide
consider a dataset with multiple feature these features are related, this relation wed refer to as weight ,
the relation between these features lets us determine its outcome(in this case ,classifying).
logistic regression is a widely used algorithm for classification.
i'll be walking through the steps that makes it acomplish it's goals.
Making use of sigmoid fuction a logistic regression predicts an outcome between 0 and 1.
but in the case of our classification , it is to be used to predict wheather or not some features fit in a particular group (0 or 1). taking a threshold of 0.5 (personal preference) and groping based on that , any feature less than the threshold is to be grouped seperately from one greater than or equal to the threshold.
our fuction determined, we can now determine our weights (relationship of the features,some are more important than others ,hence larger weight)
now,the problem is getting the best weights since we have no idea what it is. we cant brute force our way through (thats alot), but we can use gradient descent which its name suggest we descend gradually till minimal and almost optimal weights ,using its loss function ,gradually descendimg with its learning rate(so as not to overstep the minimum).by comparing the loss and continuosly adjusting the weights to minimize this loss (number of time you check and update is to be determined by you ofcourse)
139 changes: 139 additions & 0 deletions Untitled4.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [],
"source": [
"class logistic_regression():\n",
" #terms we'll use -epochs (how many times well descend or ascend to get our bwst weights)\n",
"#learning_rate = amount how fast we descend\n",
" def __init__(self,epochs=1000,learning_rate=0.001):\n",
" self.epochs = epochs\n",
" self.learning_rate = learning_rate\n",
" \n",
" \n",
" def fit(self,x,y):\n",
" #first we guess weights to pass our sigmoid function (we intialize our weight)\n",
" self.weights = 2*np.random.random((x.shape[1],x.shape[0]))-1\n",
" for i in range(self.epochs):\n",
" z = np.dot(x,self.weights)\n",
" #new prediction\n",
" self.y_new = (1/(1+ np.exp(-z)))\n",
" self.gradient = np.dot(x.T,(self.y_new-y))/y.shape\n",
" self.weights = self.weights - self.learning_rate*self.gradient\n",
" \n",
" def predict(self,x):\n",
" \n",
" z=np.dot(x,self.weights)\n",
" y = (1/(1+ np.exp(-z)))\n",
" return y>=0.5\n",
" \n",
" def confirm(self,y1,y):\n",
" return (y1==y).mean()\n",
" \n",
"\n",
" "
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"\n",
"from sklearn.datasets import load_breast_cancer\n",
"breast = load_breast_cancer()\n",
"x = breast.data\n",
"y = breast.target"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/olamide/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:15: RuntimeWarning: overflow encountered in exp\n",
" from ipykernel import kernelapp as app\n"
]
}
],
"source": [
"import numpy as np\n",
"ola = logistic_regression()\n",
"ola.fit(x,y)"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/olamide/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:22: RuntimeWarning: overflow encountered in exp\n"
]
}
],
"source": [
"p = ola.predict(x)"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.9191564147627417"
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ola.confirm(p,y)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}