Missing-Data-In-CSV File-Through-R-AND-PYTHON

IDE USED-> For R -> R Studios, For Python -> Spyder

DATASET

Country Age Salary Purchased

France 44 72000 No

Spain 27 48000 Yes

Germany 30 54000 No

Spain 38 61000 No

Germany 40 NaN Yes

France 35 58000 Yes

Spain NaN 52000 No

France 48 79000 Yes

Germany 50 83000 No

France 37 67000 Yes

PYTHON CODE

#DATA PREPROCESSING

#Importing the libraries

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

#IMPORTING DATA SET

dataset = pd.read_csv('editeddata.csv')

X = dataset.iloc[:,:-1].values # Creating independent variable vector

#print(X)

Y = dataset.iloc[:,3].values # Creating dependent variable vector

#print(Y)

from sklearn.preprocessing import Imputer

imputer = Imputer(missing_values = "NaN",strategy = 'mean', axis = 0)

imputer = imputer.fit(X[:, 1:3])

X[:,1:3] = imputer.transform(X[:,1:3])

DESCRIPTION

Where dataset.iloc[:,:-1].values means-> The Left side of the comma represents the rows that have to be taken where as Right Side of the Comma represents the number of column's.

Syntax of Imputer(missing_values = "NaN", Strategy = "mean", axis = "0", varbose = 0, copy = True)

Imputer(missing_values = "NaN",strategy = 'mean', axis = 0)

Where in missing_values="NaN" where in place of NaN we have to write what is there in the place of missing values.

Strategy means what u choose to find the missing values u can choose from Mean,Median,Mode

Axis represents weather u have to perform operations in rows or columns

imputer.fit(X[:, 1:3])

Here " fit " represents weather you want to update your dataset with the values you have calculated or not

imputer.transform(X[:,1:3])

Impute all missing values in X.

R CODE

#MISSING DATA

#Importing Dataset

dataset = read.csv('editeddata.csv')

#Taking Care of missing Data form the Dataset

#We are using Mean to fill the empty postions in the datasets here

dataset$Age = ifelse(is.na(dataset$Age),

                 ave(dataset$Age, FUN = function(x) mean(x, na.rm = TRUE)),
                 
                 dataset$Age)

dataset$Salary = ifelse(is.na(dataset$Salary),

                 ave(dataset$Salary, FUN = function(x) mean(x, na.rm = TRUE)),
                 
                 dataset$Salary)

DESCRIPTION

$Age -> It represents the age Column basically here we have to write the column name in which column we have to perform the operation

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
Datapreprocessing.py		Datapreprocessing.py
MissingData.Rmd		MissingData.Rmd
README.md		README.md
editeddata.csv		editeddata.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Missing-Data-In-CSV File-Through-R-AND-PYTHON

IDE USED-> For R -> R Studios, For Python -> Spyder

DATASET

PYTHON CODE

DESCRIPTION

R CODE

DESCRIPTION

About

Uh oh!

Releases

Packages

Uh oh!

Languages

ks1912/Missing-Data

Folders and files

Latest commit

History

Repository files navigation

Missing-Data-In-CSV File-Through-R-AND-PYTHON

IDE USED-> For R -> R Studios, For Python -> Spyder

DATASET

PYTHON CODE

DESCRIPTION

R CODE

DESCRIPTION

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages