파이썬(Python) - matplotlib ② histogram

컴퓨터/파이썬

파이썬(Python) - matplotlib ② histogram

해피밀세트 2020. 3. 23. 20:05

histogram

자료가 모여 있는 위치나 자료의 분포에 관한 대략적인 정보를 한눈에 파악할 수 있는 장점은 있으나 구체적인 수치정보를 쉽게 알아볼수없는 단점이 있다.
예) ages=[21,24,26,27,29,31,37,39,40,42,45,50,51,59,60,68]

1. List 형식

# 키에 관한 데이터 (이산형 데이터)

height = [157,163,180,162,186,178,173,152,156,184,170,171,172]
type(height)

1) 도수분포표로 만들기

# 분할 구간 만들기 (150~160, 160~170, 170~180, 180~190)

bins = [150,160,170,180,190]

# pd.cut(객체, 구간, 구간 기준)

# 구간 기준 기본값은 True

blood_cut = pd.cut(height,bins,right=True) # 150 < height <= 160
blood_cut

pd.value_counts(blood_cut)

bins = [150,160,170,180,190]
blood_cut = pd.cut(height,bins,right=False) # 150 <= height < 160
blood_cut
pd.value_counts(blood_cut)

# 인덱스 이름 사용 및 정렬

bins = [150,160,170,180,190]
label = ['150대','160대','170대','180대']
blood_cut = pd.cut(height,bins,right=True, labels=label)
pd.value_counts(blood_cut).sort_index()

2) 히스토그램 만들기

# 기본형

# plt.hist(객체, 구간)

bins = [150,160,170,180,190]

plt.hist(height, bins = bins)

# 분할 구간의 개수를 지정해서 나누기

plt.hist(height, bins = 4)

# 구간을 자동으로 분할하기

plt.hist(height, bins = 'auto')

# 막대 넓이 조절

plt.hist(height, bins = 'auto', rwidth = 0.8)

2. ndarray 형식

한달 독서량 (book)

book = [[1,2,8,13,9,4],
[5,19,9,5,11,3],
[2,3,8,4,15,6]]
book = np.array(book)
book

도수분포표 만들기

# array 형태를 1차원 배열로 바꾸기

book.shape # (3, 6)
book = book.reshape((18,)) # (18, )
book

# 도수분포표 데이터프레임 형식으로 만들기

bins = list(range(0,21,5))
label = ['0~5권','6~10권','10~15권','15~20권']
book_cut = pd.cut(book,bins,right=False, labels = label)
frq = pd.value_counts(book_cut).sort_index()
frq_sum = frq.sum()
DataFrame({'도수':frq,
'상대도수':[i/frq_sum for i in frq],
'누적도수' : frq.cumsum()})

히스토그램 만들기

plt.hist(weight, bins = bins,rwidth = 0.8,color='red')

3. DataFrame 형식

키 (height)

height.csv

0.00MB

height = pd.read_csv("C:/data/height.csv",header = None)
height

도수분포표 만들기

# DataFrame 형태를 1차원 배열로 바꾸기

height.shape # (1,24)
height = height.T # array만 reshape가능 #(24,1)
#차원 배열이 아니라서 오류남
height = height[0] # (24, )

# 1차원 배열로 바꾸기 이렇게도 가능

np.array(height).shape # (24,1)

height = np.array(height).reshape(24,) # (24, )

# 도수분포표 데이터프레임 형식으로 만들기

bins = list(range(150,191,10))

height_cut = pd.cut(height,bins,right=True)
frq = pd.value_counts(height_cut).sort_index()
frq_sum = frq.sum()
DataFrame({'도수':frq,
'상대도수':[i/frq_sum for i in frq],
'누적도수' : frq.cumsum()})

히스토그램 만들기

plt.hist(height, bins = bins,rwidth = 0.8,color='blue')

저작자표시 (새창열림)

'컴퓨터 > 파이썬' 카테고리의 다른 글

파이썬(Python) - 정규표현식과 메타문자 (0)	2020.03.24
파이썬(Python) - matplotlib ③ 상자 그림 / 줄기잎 그림 (0)	2020.03.24
파이썬(Python) - matplotlib ① pie chart / bar chart / line plot (0)	2020.03.22
파이썬(Python) - 파이썬에서 오라클 SQL로 접속하는 방법 (0)	2020.03.19
파이썬(Python) - Sqlite 사용법 (0)	2020.03.17

현재글파이썬(Python) - matplotlib ② histogram

Truman Show

딥러닝을 공부하는 블로그입니다.

Oracle, Python, 인공지능, 파이토치, SQL, Ai, 맛집, 오라클, 크롤링, pandas, 함수, 리눅스, 딥러닝, CNN, r, 시각화, 의료영상, 파이썬, 코딩, 머신러닝,

Today :
Yesterday :

Truman Show