'Python Library/Pandas' 카테고리의 글 목록 (2 Page)

[Pandas - Python] Hierarchical Indexing (계층적 인덱싱)

Hierarchical Indexing 계층적 인덱싱이란, a라는 인덱스 안에 1, 2, 3이라는 인덱스가 있고, c라는 인덱스 안에 또 1, 2, 3가 있는 그런 형태입니다. 예를 들어 홍길동이라는 이름이 있다고 가정해봅시다. 물론 남자 같은 이름이지만 여자가 있을 수도 있습니다. 남자 인덱스 안에 [홍길동, 24] 이런 식으로 넣고, 여자 인덱스 안에 [홍길동, 23] 이런 식으로 넣을 수 있습니다. 이때 사용하는 것이 Hierarchical Indexing 입니다. data = pd.Series(np.random.randn(9), index=[['a', 'a', 'a', 'b', 'b', 'c', 'c', 'd', 'd'], [1, 2, 3, 1, 3, 1, 2, 2, 3]]) data a 1 -0..

Python Library/Pandas 2022.06.14

[Pandas - Python] Computing Indicator/Dummy Variables - get_dummies() (더미 데이터 만들기)

Computing Indicator/Dummy Variables - get_dummies() 왜 더미 데이터를 만들까요? 머신러닝에 데이터를 넣기 위해서는 수치화가 필수입니다. 머신러닝은 월요일, 화요일 같은 문자열을 모르기 때문입니다. 만약에 월요일이 1이고, 화요일이 2, 수요일이 3이라고 가정해봅시다. 월요일과 화요일, 수요일은 그 어떤 연관성도 없지만, 이를 수치화하면 1 + 2 = 3이라는 연관성이 생기게 됩니다. 월요일 + 화요일 = 수요일? 원래 데이터는 전혀 연관성이 없는데도 불구하고, 이를 수치화 시키면 연관성이 생겨버리게 됩니다. 이때 더미 데이터를 만듬으로써 그러한 문제를 방지해줍니다. 즉 One-Hot Encoding을 하는 것과 마찬가지라고 보면 될 것 같습니다. df = pd.D..

Python Library/Pandas 2022.06.14

[Pandas - Python] Permutation and Random Sampling - permutation(), sample() (순열 및 무작위 샘플링)

Permutation and Random Sampling - permutation(), sample() 랜덤 순열을 이용하여 데이터를 볼 수 있습니다. df = pd.DataFrame(np.arange(5 * 4).reshape((5, 4))) sampler = np.random.permutation(5) sampler array([4, 0, 2, 3, 1]) 이처럼 5개의 순열을 랜덤으로 섞은 뒤, df.take(sampler)를 하면 행이 섞입니다. df.take(sampler) 0123 416171819 00123 2891011 312131415 14567 만약 df에서 랜덤으로 3개를 샘플링 하고 싶다면, sample(n = 3)을 쓰면 됩니다. df.sample(n=3) 0123 41617181..

Python Library/Pandas 2022.06.14

[Pandas - Python] Detecting and Filtering Outliers (데이터 필터)

Detecting and Filtering Outliers 이번에는 특정 범위를 넘어가는 데이터를 추출해보겠습니다. data = pd.DataFrame(np.random.randn(1000, 4)) data.describe() 0123 count1000.0000001000.0000001000.0000001000.000000 mean-0.0474390.0460690.024366-0.006350 std0.9971870.9983591.0089250.993665 min-3.428254-3.645860-3.184377-3.745356 25%-0.743886-0.599807-0.612162-0.697084 50%-0.0863090.043663-0.013609-0.026381 75%0.6244130.7465270.6..

Python Library/Pandas 2022.06.14

[Pandas - Python] Discretization and Binning - cut() (이산화 및 분류)

Discretization and Binning - cut() 데이터를 특정 범위에 따라 분류하는 법에 대해 알아보겠습니다. ages = [20, 22, 25, 27, 21, 23, 37, 31, 61, 45, 41, 32] bins = [18, 25, 35, 60, 100] cats = pd.cut(ages, bins) cats [(18, 25], (18, 25], (18, 25], (25, 35], (18, 25], ..., (25, 35], (60, 100], (35, 60], (35, 60], (25, 35]] Length: 12 Categories (4, interval[int64, right]): [(18, 25] < (25, 35] < (35, 60] < (60, 100]] cut 함수를 통..

Python Library/Pandas 2022.06.14

[Pandas - Python] Renaming Axis Indexes - rename() (축의 인덱스 이름 변경)

Renaming Axis Indexes - rename() 이번에는 축의 인덱스 이름을 변경해보겠습니다. data = pd.DataFrame(np.arange(12).reshape((3, 4)), index=['Ohio', 'Colorado', 'New York'], columns=['one', 'two', 'three', 'four']) data onetwothreefour Ohio0123 Colorado4567 New York891011 transform = lambda x: x[:4].upper() data.index.map(transform) Index(['OHIO', 'COLO', 'NEW '], dtype='object') index의 4번째 글자까지 대문자로 변경했습니다. data.index..