Manipulating Data with Pandas

Starting with Kaggle Tutorials(https://www.kaggle.com/learn/pandas)

Although I have gone through the 'Introduction to Machine Learning' course in advance, there wasn't enough change to handle datasets for specific purposes during programming. This course let me understand the backgrounds and objective of manupulating raw data into something useful for a programmer and enhance a little bit of Python coding skills.

1. Creating, Reading and Writing

We start with creating a single dataframe instance, but not by reading a csv file like before. This time, we will create it directly using the pd.DataFrame class. Our objective dataframe looks like the following.

fruits = pd.DataFrame([[30, 21]], columns=['Apples', 'Bananas'])

We will add one more row, and alter the index. Our goal looks like the following.

data = [[35, 21], [41, 34]]
columns = ['Apples', 'Bananas']
index = ['2017 Sales', '2018 Sales']
fruits = pd.DataFrame(data, columns=columns, index=index)

Now, we are going to create something called a Series. Pandas Series are

data = ['4 cups', '1 cup', '2 large', '1 can']
index = ['Flour', 'Milk', 'Eggs', 'Spam']
ingredients = pd.Series(data, index=index, name='Dinner')

We can create the same result more easily by passing a Python dictionary that contains the equivalent content required.

dic = {'Flour':'4 cups', 'Milk':'1 cup', 'Eggs':'2 large', 'Spam':'1 can'}
ingredients = pd.Series(dic, name='Dinner')

This time, we will read a csv file that contains wine reviews from all around the world and store the dataset into a dataframe. The first few lines of the dataset looks somewhat like the following.