1. Course ID: 219152
  2. Course Name: Web Scraping with Python (Python網頁擷取程式設計)
  3. Instructor: Dr. Quincy Wu
  4. Target Students: Juniors, Seniors, and Postgraduates
  5. Upperlimit: 10 Students
  6. Credit: 3
  7. Software:
    1. PuTTY 0.73,
    2. Anaconda3 5.3.1 (64-bit)
    3. EverCam (9.0) 僅需安裝於教師電腦
  8. Overview: This course teaches you web scraping and crawling techniques to access unlimited data from any web source in any format. With this practical guide, you’ll learn how to use Python scripts and web APIs to gather and process data from thousands--or even millions--of web pages at once. It is ideal for programmers, security professionals, and web administrators familiar with Python. This course not only teaches basic web scraping mechanics, but also delves into more advanced topics, such as analyzing raw data or using scrapers for frontend website testing.
  9. Evaluation:
  10. Textbook:
  11. Outline:
    1. Python at a Glance
    2. BeautifulSoup - Your First Web Scrapper
    3. Advanced HTML Parsing
    4. Starting to Crawl
    5. Using APIs
    6. Storing Data
    7. Reading Documents
    8. Cleaning Your Dirty Data
    9. Reading and Writing Natural Languages
    10. Crawling Through Forms and Logins
    11. Scraping JavaScript
    12. Image Processing and Text Recognition
    13. Avoiding Scrapping Traps
      • Rejected:
      • [HW] http://www.nthu.edu.tw/periodical
    14. Testing Your Website with Scrapers
    15. Scraping Remotely
    16. Final Presentation
  12. Q & A

Hours Arrangement

Exercises

  1. Search All my HackMD