CSCI 5541, NLP
Spring 2025, Tuesdays and Thursdays , 4:00pm to 5:15pm, Lind Hall L125
Course Information
Summary The purpose of this course is to provide an overview of the computational techniques developed to enable computers to interpret and respond appropriately to ideas expressed using natural languages, rather than formal languages, such as C++ or Python. This course will cover text classification, distributional representation methods of language, large language models, and advanced techniques in chatGPT. The course will cover a wide range of topics related to NLP, including theories, computational models, and applications with their societal and ethical impacts. Prerequisite: Maturity in linear algebra, calculus, and basic probability. Familiarity with Python. 5521 (recommended) or grad,
Natural Language Processing (NLP) is an interdisciplinary field that is based on theories in linguistics, cognitive science, and social science. The main focus of NLP is building computational models for applications such as machine translation and dialogue systems that can then interact with real users. Research and development in NLP therefore also includes considering important issues related to real-world AI systems, such as bias, controllability, interpretability, and ethics. This course will cover a broad range of topics related to NLP, from theories to computational models and applications to data annotation and evaluation. Students will read papers on those topics, create an annotated dataset, and implement algorithms on applications they are interested in. There will be a semester-long class project where you collect your own dataset, ensure it is accurate, develop a model using existing computing tools, evaluate the system, and consider its ethical and societal impacts.
The grade will be evaluated based on the course project, participation, and programming and reading assignments. All class material will be posted on the class site. We will use Canvas for homework and project submissions and grading, and Slack for discussion and QA. Email inquiries will be not be replied.
- Instructors
-
- Class meets
- Tuesday and Thursday, 4PM to 5:15PM, Lind Hall L125
- Office hours
- James: Friday 3pm - 3:30pm via Zoom
- Risako: Wednesday 10-10:30AM Shepherd 159
- Bin: Monday 10-10:30AM Keller 1-213
- Junhan: Tuesday 1:30-2PM Keller 1-213
- Class page
- https://jimtmooney.github.io/Courses/S25/index.html
- Slack
- https://csci5541s25.slack.com/
- Canvas
- canvas.umn.edu/courses/483164
Grading and Late Policy
Grading
- 60% Homework (hw1/2/3/6 for individual, hw4/5 for team)
- 30% Project (team)
- 10% Class Participation (individual)
Late policy for deliverables
Each student will be granted 5 late days to use for homeworks over the duration of the semester. After all free late days are used up, penalty is 1 point for each additional late day. The late days and penalty will be applied to all team members for group homework and project.Schedule
We will cover basic NLP representations g(x), to build text classifiers P_theta(y|g(x)) , language models P_theta(g(x)), and large language models P_{theta is large}(g(x)). Based on knowledge you gain during the class, your team will develop your own NLP systems during the semester-long project. Pay attention to due dates and homework release. Lecture slides and homework/project description will be available in .
Homework Details (60%)
All questions regarding homework MUST be communicated with the lead TA over Slack homework channels (e.g., #hw1, #hw2) or during their office hours. Homework 1, 2, 3, and 6 should be done individually, while homework 4 and 5 are team-based (maximum of 4 people). Your team for homework 4 and 5 should be the same for the project team. The use of outside resources (books, research papers, websites, etc.) or collaboration (students, professors, chatGPT, etc.) must be explicitly acknowledged in your report. Check out the notes for academic intergrity.
The deadline for all homework is by midnight (11:59PM) of the due date. Due to a tight schedule, there will be no deadline extension, but you can still use your late days. For the delayed team homework, late days for every team member will be counted. Check out the homework description and link to canvas for submission:
Here are homework assignments with dues:
- HW1: Building MLP-based text classifier with pytorch (5 points, Individual, due: Feb 4) (, )
- HW2: Finetuning text classifier using HuggingFace (10 points, Individual, due: Feb 11) (, )
- HW3: Authorship attribution using language models (LMs) (10 points, Team, due: Mar 4) (, )
- HW4: Generating and evaluating text generated from pretrained LMs (15 points, Team, due: Mar 27) (, )
- HW5: Prompting with large language models (LLMs) (15 points, Team, due: Apr 17)
- HW6: Essay writing with ChatGPT (5 points, Individual, due: May 1)
Project Details (30%)
First, carefully read the project description , as most project information, dues, rubric, and answers to your questions are in the description document. It is your responsbililty to miss any information regarding the project. Your team (maximum of 4 people) should submit their report, link to code (or a zipped code), and presentation slides/poster to Canvas before the deadline. Use official ACL style templates (Overleaf or links). Here are some dues you have to submit for project (note that some dues are during week days):
- Team formation (1 point, due: Feb 6) ()
- Project brainstorming (1 point, due: Feb 18) ()
- Proposal pitch (3 points, due: Feb 25 and 27) (Slides decks for Group A and Group B)
- Proposal report (5 points, due: Mar 6) ()
- Midterm office hour participation (5 points, due: Apr 8)
- Poster presentation (5 points, due: Apr 29 and May 1)
- Final report (10 points, due: May 8)
You can find some selected project reports and posters from the previous years' NLP classes below. Some projects are extended and published top-tier workshop and conferences:
- [CSCI 5541 S23] Simulating Everyone's Voice: Exploring ChatGPTs Ability to Simulate Human Annotators
- [CSCI 5541 S23] Vision & Language-guided Generalized Object Grasping
- [CSCI 5541 S23] Generalizability of FLAN-T5 Model Using Composite Task Prompting
- [CSCI 5541 S23] Comparing the Effectiveness of Fine-tuning vs. One-Shot Learning on the Kidz Bopification Task
- [CSCI 5980 F22] Generating Controllable Long-dialogue with Coherence → Published in AAAI 2024
- [CSCI 8980 S22] Understanding Narrative Transportation in Fantasy Fanfiction → Published in Workshop on Narrative Understanding (WNU) @ACL 2023
Class Participation (10%)
Your class participation is thoroughly evaluated. Put your profile picture on Canvas and Slack so we can match you for the final evaluation. The following metrics will be used to grade your participation:- Participation and discussion in class
- Discussion on Slack and during Office Hours for both instructor and TAs
- Discussion and QA during the presentation of the project proposal and poster
Prerequisites
Required: CSCI 2041 Advanced Programming Principles
Recommended: CSCI 5521 Introduction to Machine Learning or any other course that covers fundamental machine learning algorithms.
Furthermore, this course assumes:
- Good coding ability, corresponding to at least a third or fourth-year undergraduate CS major. Assignments will be in Python.
- Background in basic probability, linear algebra, and calculus.
Notes to students
Academic Integrity
Assignments and project reports for the class must represent individual effort unless group work is explicitly allowed. Verbal collaboration on your assignments or class projects with your classmates and instructor is acceptable. But, everything you turn in must be your own work, and you must note the names of anyone you collaborated with on each problem and cite resources that you used to learn about the problem. If you have any doubts about whether a particular action may be construed as cheating, ask the instructor for clarification before you do it. Cheating in this course will result in a grade of F for course and the University policies will be followed.
Students with Disabilities
If you have a disability for which you are or may be requesting an accommodation, you are encouraged to contact both your instructor and Disability Resources Center (DRC).
COVID-19
All students are expected to abide by campus policies regarding COVID-19 including masking and vaccination requirements. This is an in-person class with daily in-person activities, but we may consider a hybrid or online option. If you're feeling sick, stay at home and catch up with the course materials instead of coming to class!