파이썬(Python) - 크롤링 연습 ② 국민 청원 청원 목록 수집(추천순)

컴퓨터/파이썬

파이썬(Python) - 크롤링 연습 ② 국민 청원 청원 목록 수집(추천순)

해피밀세트 2020. 4. 3. 19:02

# 국민 청원 접속 및 청원 목록 url 수집

url = []
for i in range(1,21):
    html = urlopen("https://www1.president.go.kr/petitions/best?page={}".format(i))
    soup = BeautifulSoup(html,'html.parser')
    for j in soup.findAll('div',{'class':'bl_body'}):
        for k in j.findAll('div',{'class':'bl_subject'}):
            a = k.find('a')['href']
            if bool(re.match('/[a-z].*/[0-9].*\?navigation=best',a)):
                url.append(a)

# 청원 제목 수집

title = []
for i in url:
    html = urlopen('https://www1.president.go.kr/'+i)
    soup = BeautifulSoup(html, "html.parser")
    for i in soup.findAll('h3',{'class':'petitionsView_title'}):
        title.append(i.text)

# 문장 정제작업

txt = ''
for i in title :
txt = txt + ' ' + re.sub('[<|>|(|)|!|\,|/|\.]',' ',i)

# 단어별 빈도수 체크 / 불용어 처리

from konlpy.tag import Hannanum
hannanum = Hannanum()
text_list = hannanum.nouns(txt)
stopword = ['및','등']
ko = [i for i in text_list if i not in stopword]
ko = nltk.Text(ko)
data = ko.vocab()

# wordcloud 만들기

wordcloud = WordCloud(font_path='C:\windows/fonts/malgun.ttf',
background_color = 'white',
width = 1000, height = 800).generate_from_frequencies(dict(data))
plt.figure(figsize=(20,20))
plt.imshow(wordcloud)
plt.axis("off")
plt.show()

저작자표시 (새창열림)

'컴퓨터 > 파이썬' 카테고리의 다른 글

파이썬(Python) - Class ① (0)	2020.04.06
파이썬(Python) - 스크래핑 ④ selenium을 이용한 크롤링 (0)	2020.04.05
파이썬(Python) - 한글 형태소 분석 (0)	2020.04.02
파이썬(Python) - 크롤링 연습 ① 사람인 빅데이터 채용 조건 수집 (0)	2020.04.01
파이썬(Python) - 스크래핑 ③ JSON을 이용한 크롤링 (0)	2020.03.31

현재글파이썬(Python) - 크롤링 연습 ② 국민 청원 청원 목록 수집(추천순)

Truman Show

딥러닝을 공부하는 블로그입니다.

의료영상, 딥러닝, 서울맛집, 리눅스, SQL, Ai, Python, 파이토치, 크롤링, 코딩, 인공지능, 맛집, 오라클, CNN, 함수, r, pandas, 파이썬, 머신러닝, Oracle,

Today :
Yesterday :

일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Truman Show