Unit Overview

Description

Natural Language has been and will remain as the most preferred way to store and transfer knowledge. More than 80% of electronic data in modern societies are generated and stored in textual format. How to process unstructured text to extract useful insights and support actionable decision making and discover the hidden treasure of collective intelligence is of enormous value. In this unit, we start with traditional text processing techniques using Regular Expressions and discuss the needs of text processing and normalisation. We then introduce fundamental pipelines of natural language processing (NLP), including part-of-speech tagging and various ways of sentence parsing, with the aim of introducing traditional text feature collection techniques for higher-level tasks such as sentiment or document classification. Building on the understanding of the pros and cons of feature-based NLP pipeline approaches, the unit moves onto the modern approach of deep learning for NLP, focusing on word vector representation, neural language models, and recurrent neural networks for NLP. The unit situates the techniques around major NLP tasks, including information extraction, sentiment detection, dialogue systems and machine translation.

Credit
6 points
Offering
AvailabilityLocationModeFirst year of offer
Not available in 2026India - MumbaiOn-campus
Outcomes

Students are able to (1) apply pre-processing techniques for textual data preparation; (2) build pipelines for core NLP tasks; (3) critically analyse different language models; (4) explain how vector representations of words can be obtained; (5) evaluate performance of NLP solutions, both traditional and neural; and (6) undertake core components of major NLP tasks.

Assessment

Indicative assessments in this unit are as follows: (1) programming assignment and (2) final examination. Further information is available in the unit outline.



Student may be offered supplementary assessment in this unit if they meet the eligibility criteria.

Unit Coordinator(s)
TBA
Unit rules
Prerequisites
Successful completion of
96 points
Incompatibility
CITS4012 Natural Language Processing
Contact hours
1 x 2 hour lectures and 1 x 2 hour laboratories per week
  • The availability of units in Semester 1, 2, etc. was correct at the time of publication but may be subject to change.
  • All students are responsible for identifying when they need assistance to improve their academic learning, research, English language and numeracy skills; seeking out the services and resources available to help them; and applying what they learn. Students are encouraged to register for free online support through GETSmart; to help themselves to the extensive range of resources on UWA's STUDYSmarter website; and to participate in WRITESmart and (ma+hs)Smart drop-ins and workshops.
  • Visit the Essential Textbooks website to see if any textbooks are required for this Unit. The website is updated regularly so content may change. Students are recommended to purchase Essential Textbooks, but a limited number of copies of all Essential Textbooks are held in the Library in print, and as an ebook where possible. Recommended readings for the unit can be accessed in Unit Readings directly through the Learning Management System (LMS).
  • Contact hours provide an indication of the type and extent of in-class activities this unit may contain. The total amount of student work (including contact hours, assessment time, and self-study) will approximate 150 hours per 6 credit points.