Top 25 pandas tricks.ipynb¶
url: https://nbviewer.org/github/justmarkham/pandas-videos/blob/master/top_25_pandas_tricks.ipynb
In [1]:
import pandas as pd
import numpy as np
In [2]:
drinks = pd.read_csv('http://bit.ly/drinksbycountry')
movies = pd.read_csv('http://bit.ly/imdbratings')
orders = pd.read_csv('http://bit.ly/chiporders', sep='\t')
orders['item_price'] = orders.item_price.str.replace('$', '').astype('float')
stocks = pd.read_csv('http://bit.ly/smallstocks', parse_dates=['Date'])
titanic = pd.read_csv('http://bit.ly/kaggletrain')
ufo = pd.read_csv('http://bit.ly/uforeports', parse_dates=['Time'])
C:\Users\huise\AppData\Local\Temp\ipykernel_26448\3279769219.py:4: FutureWarning: The default value of regex will change from True to False in a future version. In addition, single character regular expressions will *not* be treated as literal strings when regex=True. orders['item_price'] = orders.item_price.str.replace('$', '').astype('float')
3.칼럼명 바꾸는 법¶
In [3]:
#데이터 프레임 생성
df = pd.DataFrame({'col one':[100, 200], 'col two':[300, 400]})
df
Out[3]:
col one | col two | |
---|---|---|
0 | 100 | 300 |
1 | 200 | 400 |
In [4]:
df.rename(mapper = {'col one': 'col_one', 'col two':'col_two'},axis='columns')
Out[4]:
col_one | col_two | |
---|---|---|
0 | 100 | 300 |
1 | 200 | 400 |
In [5]:
df.rename(columns = {'col one': 'col_one', 'col two':'col_two'})
Out[5]:
col_one | col_two | |
---|---|---|
0 | 100 | 300 |
1 | 200 | 400 |
In [6]:
df.rename(mapper= {0:1,1:2},axis='index')
Out[6]:
col one | col two | |
---|---|---|
1 | 100 | 300 |
2 | 200 | 400 |
In [7]:
df.rename(index= {0:1,1:2})
Out[7]:
col one | col two | |
---|---|---|
1 | 100 | 300 |
2 | 200 | 400 |
pandas.DataFrame.columns을 이용¶
In [8]:
df = pd.DataFrame({'col one':[100, 200], 'col two':[300, 400]})
df
Out[8]:
col one | col two | |
---|---|---|
0 | 100 | 300 |
1 | 200 | 400 |
In [9]:
#직접 두개의 칼럼 명을 바꿔준다
df.columns=['col_one', 'col_two']
df
Out[9]:
col_one | col_two | |
---|---|---|
0 | 100 | 300 |
1 | 200 | 400 |
In [10]:
df = pd.DataFrame({'col one':[100, 200], 'col two':[300, 400]})
df
Out[10]:
col one | col two | |
---|---|---|
0 | 100 | 300 |
1 | 200 | 400 |
In [11]:
df.columns=df.columns.str.replace(' ','_')
df
Out[11]:
col_one | col_two | |
---|---|---|
0 | 100 | 300 |
1 | 200 | 400 |
In [12]:
df.add_prefix(prefix='X_')
Out[12]:
X_col_one | X_col_two | |
---|---|---|
0 | 100 | 300 |
1 | 200 | 400 |
In [13]:
df.add_suffix(suffix='_Y')
Out[13]:
col_one_Y | col_two_Y | |
---|---|---|
0 | 100 | 300 |
1 | 200 | 400 |
4. 행의 순서 뒤집기¶
In [14]:
drinks.head()
Out[14]:
country | beer_servings | spirit_servings | wine_servings | total_litres_of_pure_alcohol | continent | |
---|---|---|---|---|---|---|
0 | Afghanistan | 0 | 0 | 0 | 0.0 | Asia |
1 | Albania | 89 | 132 | 54 | 4.9 | Europe |
2 | Algeria | 25 | 0 | 14 | 0.7 | Africa |
3 | Andorra | 245 | 138 | 312 | 12.4 | Europe |
4 | Angola | 217 | 57 | 45 | 5.9 | Africa |
In [15]:
# loc[::-1]으로 행순서를 뒤집어 줄 수 있음.
drinks.loc[::-1].head()
Out[15]:
country | beer_servings | spirit_servings | wine_servings | total_litres_of_pure_alcohol | continent | |
---|---|---|---|---|---|---|
192 | Zimbabwe | 64 | 18 | 4 | 4.7 | Africa |
191 | Zambia | 32 | 19 | 4 | 2.5 | Africa |
190 | Yemen | 6 | 0 | 0 | 0.1 | Asia |
189 | Vietnam | 111 | 2 | 1 | 2.0 | Asia |
188 | Venezuela | 333 | 100 | 3 | 7.7 | South America |
In [16]:
# reset_index로 인덱스도 0부터 시작하게 만들어 줌
drinks.loc[::-1].reset_index(drop=True).head()
Out[16]:
country | beer_servings | spirit_servings | wine_servings | total_litres_of_pure_alcohol | continent | |
---|---|---|---|---|---|---|
0 | Zimbabwe | 64 | 18 | 4 | 4.7 | Africa |
1 | Zambia | 32 | 19 | 4 | 2.5 | Africa |
2 | Yemen | 6 | 0 | 0 | 0.1 | Asia |
3 | Vietnam | 111 | 2 | 1 | 2.0 | Asia |
4 | Venezuela | 333 | 100 | 3 | 7.7 | South America |
5. 칼럼 순서 뒤집기¶
In [17]:
#loc[:,::-1]으로 칼럼 순서도 뒤집어 줄 수 있음
drinks.loc[:, ::-1].head()
Out[17]:
continent | total_litres_of_pure_alcohol | wine_servings | spirit_servings | beer_servings | country | |
---|---|---|---|---|---|---|
0 | Asia | 0.0 | 0 | 0 | 0 | Afghanistan |
1 | Europe | 4.9 | 54 | 132 | 89 | Albania |
2 | Africa | 0.7 | 14 | 0 | 25 | Algeria |
3 | Europe | 12.4 | 312 | 138 | 245 | Andorra |
4 | Africa | 5.9 | 45 | 57 | 217 | Angola |
6. data type으로 칼럼 선택하기¶
In [18]:
drinks.dtypes
Out[18]:
country object beer_servings int64 spirit_servings int64 wine_servings int64 total_litres_of_pure_alcohol float64 continent object dtype: object
In [19]:
drinks.dtypes
Out[19]:
country object beer_servings int64 spirit_servings int64 wine_servings int64 total_litres_of_pure_alcohol float64 continent object dtype: object
In [20]:
# number: float과 int
drinks.select_dtypes(include='number').head()
Out[20]:
beer_servings | spirit_servings | wine_servings | total_litres_of_pure_alcohol | |
---|---|---|---|---|
0 | 0 | 0 | 0 | 0.0 |
1 | 89 | 132 | 54 | 4.9 |
2 | 25 | 0 | 14 | 0.7 |
3 | 245 | 138 | 312 | 12.4 |
4 | 217 | 57 | 45 | 5.9 |
In [21]:
# object: object
drinks.select_dtypes(include='object').head()
Out[21]:
country | continent | |
---|---|---|
0 | Afghanistan | Asia |
1 | Albania | Europe |
2 | Algeria | Africa |
3 | Andorra | Europe |
4 | Angola | Africa |
In [22]:
# 리스트를 이용하여 여러 dtype도 선택가능
drinks.select_dtypes(include=['number', 'object', 'category', 'datetime']).head()
Out[22]:
country | beer_servings | spirit_servings | wine_servings | total_litres_of_pure_alcohol | continent | |
---|---|---|---|---|---|---|
0 | Afghanistan | 0 | 0 | 0 | 0.0 | Asia |
1 | Albania | 89 | 132 | 54 | 4.9 | Europe |
2 | Algeria | 25 | 0 | 14 | 0.7 | Africa |
3 | Andorra | 245 | 138 | 312 | 12.4 | Europe |
4 | Angola | 217 | 57 | 45 | 5.9 | Africa |
In [23]:
# exclude를 쓰면 number dtype 외의 것을 선택
drinks.select_dtypes(exclude='number').head()
Out[23]:
country | continent | |
---|---|---|
0 | Afghanistan | Asia |
1 | Albania | Europe |
2 | Algeria | Africa |
3 | Andorra | Europe |
4 | Angola | Africa |
'코딩 > 판다승(판다스공부하는희승)' 카테고리의 다른 글
[pandas] 문자열 처리.str(1): 대/소문자 변경, 문자 분류 (0) | 2022.08.08 |
---|---|
top_25_pandas_tricks(2) (0) | 2022.08.03 |