- Course ID: 219152
- Course Name: Web Scraping with Python (Python網頁擷取程式設計)
- Instructor: Dr. Quincy Wu
- Target Students: Juniors, Seniors, and Postgraduates
- Upperlimit: 10 Students
- Credit: 3
- Overview: This course teaches you web scraping and crawling techniques
to access unlimited data from any web source in any format. With this
practical guide, you’ll learn how to use Python scripts and web APIs to
gather and process data from thousands--or even millions--of web pages at
once. It is ideal for programmers, security professionals, and web
administrators familiar with Python. This course not only teaches basic
web scraping mechanics, but also delves into more advanced topics, such
as analyzing raw data or using scrapers for frontend website testing.
- Evaluation:
- Homework (60%)
- Term Project (40%)
- Textbook:
- Outline:
- Python at a Glance
- BeautifulSoup - Your First Web Scrapper
- Advanced HTML Parsing
- Starting to Crawl
- Using APIs
- Storing Data
- Reading Documents
- Cleaning Your Dirty Data
- Reading and Writing Natural Languages
- Crawling Through Forms and Logins
- Scraping JavaScript
- Image Processing and Text Recognition
- Avoiding Scrapping Traps
- Rejected:
- [HW] http://www.nthu.edu.tw/periodical
- Testing Your Website with Scrapers
- Scraping Remotely
- Final Presentation
Hours Arrangement
- Problem statement and demonstration
- Tools introduction
- Design and hands-on trial
- Bonus: Your own problem
- Bonus: Easier (not necessarily more powerful) tools