1. (20%) Professors' Emails
    There are three groups in Department of EE in NCNU: System Group, Electronics Group, and Communication Group.
    The members in these groups are 15, 11, and 13, respectively. The member lists can be found at The department office need a program to dynamically retrieve the above webpages to generate a list of email addresses so that they can send notifications to those professors. Sometimes they only need to notify the members in the System Group; sometimes they may notify members in two or three groups. The program may run as follows:
    
    $ python urllib_get_email.py  ee_electronics 
    "林佑昇" <stephenlin@ncnu.edu.tw>
    "王義明(Yi-Ming Wang)" <renowang@ncnu.edu.tw>
    "吳幼麟" <ylwu@ncnu.edu.tw>
    "張振豪" <chchang1@ncnu.edu.tw>
    "施君興 (Chun-Hsing Shih) " <shihch@ncnu.edu.tw>
    "許孟烈" <sheu@ncnu.edu.tw>
    "孫台平" <tps@ncnu.edu.tw>
    "鄭義榮 (Yi-Jung Cheng) " <yjcheng@ncnu.edu.tw>
    "吳俊德" <ginderwu@ncnu.edu.tw>
    "程德勝 (Tak-Shing Ching) " <tsching@ncnu.edu.tw>
    "陳建亨 (Henry J. H. Chen) " <henry@ncnu.edu.tw>
    
    
    Please note that a member may participate multiple groups, so if you try to list all members in three groups, there will be only 23 members.
    
    $ python urllib_get_email.py  ee_electronics ee_system ee_communication 
    "李彥文" <ymlee@ncnu.edu.tw>
    "魏學文" <will@ncnu.edu.tw>
    "吳幼麟" <ylwu@ncnu.edu.tw>
    "陳建亨 (Henry J. H. Chen) " <henry@ncnu.edu.tw>
    "王瑞騰" <jtwang@ncnu.edu.tw>
    "孫台平" <tps@ncnu.edu.tw>
    "林佑昇" <stephenlin@ncnu.edu.tw>
    "黃建華" <jhhuang@ncnu.edu.tw>
    "翁偉中" <wcweng@ncnu.edu.tw>
    "李佩君" <pjlee@ncnu.edu.tw>
    "林容杉" <jslin@ncnu.edu.tw>
    "張進福" <jfchang@ncnu.edu.tw>
    "王義明(Yi-Ming Wang)" <renowang@ncnu.edu.tw>
    "許孟烈" <sheu@ncnu.edu.tw>
    "張振豪" <chchang1@ncnu.edu.tw>
    "程德勝 (Tak-Shing Ching) " <tsching@ncnu.edu.tw>
    "洪志偉" <jwhung@ncnu.edu.tw>
    "鄭義榮 (Yi-Jung Cheng) " <yjcheng@ncnu.edu.tw>
    "郭耀文" <ywkuo@ncnu.edu.tw>
    "吳俊德" <ginderwu@ncnu.edu.tw>
    "陳文雄" <wschen@ncnu.edu.tw>
    "林繼耀" <kylum@ncnu.edu.tw>
    "施君興 (Chun-Hsing Shih) " <shihch@ncnu.edu.tw>
    
    
    You are required to develop a Python program which utilizes urllib (webpage retrieval) and re (regular expression) to dynamically generate the list. If you hardwire the lists in your program, you will only get 7 points for this question.
  2. (20%) Login Failure
    As an administrator, it is necessary for you to occasionally review some log files to be aware whether your server is under attack.
    1. Suppose we have a file "auth.log" which records the login success/failure of each user when he/she connects to the server via SSH (by PuTTY, for example).
    2. A successful login looks like "Apr 5 19:39:24 STU sshd[9007]: Accepted keyboard-interactive/pam for s104321014 from 10.49.21.113 port 58943 ssh2"
    3. A login failure (due to incorrect password) looks like "Apr 4 22:31:34 STU sshd[64570]: error: PAM: authentication error for s104321014 from 111-242-75-125.dynamic.hinet.net"
    4. Write a Python script to read the log file at "/var/log/auth.log.for.midterm".
    5. Sort reversely by the number of login failure of each user (certainly we are more concerned about the one with most login failures).
    6. For each user, list all his/her login records. Show the information about
      • Login date and time
      • Login success/failure
      • Remote IP address
    7. A sample output may look like
      
      s104321014 5
               Apr  4 22:31:34 Failure 111-242-75-125.dynamic.hinet.net
               Apr  4 22:31:34 Failure 111-242-75-125.dynamic.hinet.net
               Apr  4 22:31:37 OK      111.242.75.125
               Apr  4 23:06:54 OK      111.242.75.125
               Apr  5 09:47:13 Failure 111-242-75-125.dynamic.hinet.net
               Apr  5 09:47:15 OK      111.242.75.125
               Apr  5 10:02:40 Failure 111-242-75-125.dynamic.hinet.net
               Apr  5 10:02:42 OK      111.242.75.125
               Apr  5 10:11:42 OK      111.242.75.125
               Apr  5 19:32:31 OK      10.49.21.113
               Apr  5 19:39:22 Failure ip113.puli21-49-10.ncnu.edu.tw
               Apr  5 19:39:24 OK      10.49.21.113
      s102321030 4
               Apr  4 21:41:49 OK      36.232.67.233
               Apr  5 12:24:30 OK      36.232.79.7
               Apr  5 15:26:58 OK      36.232.79.7
               Apr  5 16:05:54 OK      36.232.79.7
               Apr  5 16:24:53 OK      36.232.79.7
               Apr  5 16:36:51 OK      36.232.79.7
               Apr  5 16:43:10 OK      36.232.79.7
               Apr  5 17:16:50 Failure 36-232-79-7.dynamic-ip.hinet.net
               Apr  5 17:16:51 Failure 36-232-79-7.dynamic-ip.hinet.net
               Apr  5 17:16:54 OK      36.232.79.7
               Apr  5 18:37:37 OK      36.232.79.7
               Apr  5 19:01:15 Failure 36-232-79-7.dynamic-ip.hinet.net
               Apr  5 19:01:17 OK      36.232.79.7
               Apr  5 19:10:37 Failure 36-232-79-7.dynamic-ip.hinet.net
               Apr  5 19:10:41 OK      36.232.79.7
               Apr  5 20:13:06 OK      36.232.79.7
               Apr  5 20:31:46 OK      36.232.79.7
      s102321023 4
               Apr  4 21:51:06 OK      61.227.52.39
               Apr  5 15:29:26 Failure 2001:e10:6840:21:5c6d:15e7:67d6:204
               Apr  5 15:29:28 Failure 2001:e10:6840:21:5c6d:15e7:67d6:204
               Apr  5 15:29:28 Failure 2001:e10:6840:21:5c6d:15e7:67d6:204
               Apr  5 16:45:09 OK      122.117.95.183
               Apr  5 16:53:31 OK      122.117.95.183
               Apr  5 16:54:14 OK      122.117.95.183
               Apr  5 17:04:04 OK      122.117.95.183
               Apr  5 20:27:17 Failure 122-117-95-183.hinet-ip.hinet.net
               Apr  5 20:27:18 OK      122.117.95.183
               Apr  5 20:28:01 OK      122.117.95.183
      s103321039 3
               Apr  5 10:29:25 OK      114.46.59.243
               Apr  5 11:13:57 OK      114.46.59.243
               Apr  5 11:22:59 Failure 114-46-59-243.dynamic.hinet.net
               Apr  5 11:23:01 OK      114.46.59.243
               Apr  5 12:09:30 Failure 114-46-59-243.dynamic.hinet.net
               Apr  5 12:09:32 OK      114.46.59.243
               Apr  5 19:08:46 Failure 114-46-59-243.dynamic.hinet.net
               Apr  5 19:08:48 OK      114.46.59.243
               Apr  5 19:09:07 OK      114.46.59.243
               Apr  5 19:46:27 OK      114.46.59.243
               Apr  5 20:02:31 OK      114.46.59.243
      
      
    8. If a failed login is followed by a successfully login, usually this is caused by user's mistyping, so we don't need to worry about that.
    9. However, if there are repeated login failures, which come from foreign IP addresses, this may be a symptom that some malicious hacker is guessing the password by a brute force attack.
  3. (20%) Extract hyperlinks
    Write a Python program, which will get a URL from its command-line argument.
    1. You'll need to import sys to access sys.argv.
    2. The program will retrieve the webpage specified by the URL, and extract the hyperlink (HREF attribute) in the HTML <A> tag.
    3. Lists all hyperlinks in the webpage. For example, if the given URL is 'http://Course.ipv6.club.tw/Python.1042/', the output may look like
      
      [1] http://Course.ipv6.club.tw/Python.1042/student.html
      [2] http://www.amazon.com/Python-Unix-Linux-System-Administration/dp/0596515820/ref=sr_1_2?s=books&ie=UTF8&qid=1448019854&sr=1-2&keywords=Python+system+Administration
      [3] http://www.amazon.com/Essential-SNMP-Second-Douglas-Mauro/dp/0596008406/ref=sr_1_1?s=books&ie=UTF8&qid=1448020628&sr=1-1&keywords=essential+snmp
      [4] https://automatetheboringstuff.com
      [5] https://talkpython.fm/episodes/show/44/project-jupyter-and-ipython
      [6] http://examples.oreilly.com/9780596515829/
      [7] http://docs.python.org/3/
      
      
  4. (20%) Verify hyperlinks
    Extend your previous program so that, it will verify whether those hyperlinks in the webpage are valid.
    1. Given a URL, if every hyperlink in the webpage can be retrieved, your program simply reports that "Every link is all right!".
    2. If some hyperlinks cannot be retrieved, report that these hyperlinks cannot be found. (You need not report about valid hyperlinks, because only invalid hyperlinks deserve our concern to correct them.)
    3. You may test your program with the following URLs:
      • http://solomon.ipv6.club.tw/Course/Python.1042/
      • http://solomon.ipv6.club.tw/Course/Python.1042/index.html
      • http://solomon.ipv6.club.tw/Course/Python.1042/index2.html
      • http://solomon.ipv6.club.tw/Course/Python.1042/index3.html
    4. The output of the program may look like:
      
      Now verifying http://solomon.ipv6.club.tw/Course/Python.1042/index.html
          Every link is all right!
      
      Now verifying http://solomon.ipv6.club.tw/Course/Python.1042/index2.html Not Found [4] https://automatetheboringstuff.con Not Found [7] http://docs.python.org/3/non_exist.html
    5. Hint: You may need to implement the Exception Handling of Python so that your program won't crash for an invalid URL.
    6. You may also need urllib.parse.urlparse() and os.path.dirname() to help you parsing the path.
  5. (20%) HTML Form Submission
    Consider the TANet VoIP Phonebook, which provides an HTML form so that users can query the VoIP number of a specific user or a specific organization.
    1. Inspecting the HTTP request, we found that this form will submit two values to the server
      1. 'action': its value will be 'searchKeyword'
      2. 'keyWord': its value will be the string you want to search
    2. Please write a Python program to allow users querying this phonebook with a plaintext user interface.
    3. The program may run as follows.
      
      Please input the string to query -- 香山高中
      縣市別  區鄉鎮  學校名稱        職    稱        姓    名        網路電話        市話代表號
      新竹市  香山區  香山高中        校  長        校  長        91841401        N/A
      新竹市  香山區  香山高中        教務主任        教務主任        91841402        N/A
      新竹市  香山區  香山高中        學務主任        學務主任        91841403        N/A
      新竹市  香山區  香山高中        總務主任        總務主任        91841404        N/A
      新竹市  香山區  香山高中        輔導主任        輔導主任        91841405        N/A
      新竹市  香山區  香山高中        文書組長        文書組長        91841406        N/A
      新竹市  香山區  香山高中        資訊組長        資訊組長        91841407        N/A
      新竹市  香山區  香山高中        人事主任        人事主任        91841408        N/A
      新竹市  香山區  香山高中        衛生組長        衛生組長        91841414        N/A
      新竹市  香山區  香山高中        教學組長        教學組長        91841416        N/A
      新竹市  香山區  香山高中        註冊組長        註冊組長        91841418        N/A
      新竹市  香山區  香山高中        主計主任        主計主任        91841423        N/A
      
      
    4. You may test your program with keywords '香山高中' or '彰化縣'.
    5. Please note that it is totally all right if you are unable to align those data entries. Because some cities/counties did not provide data for all fields, it is not easy to align output with incomplete data.