CISC 372 Advanced Data Analytics

- The Data-Driven Mindset

Queen's University, 2019-winter

Professor: Steven Ding

“Information is the oil of the 21st century, and analytics is the combustion engine.” – By Peter Sondergaard

“Data analytic is not just a skill set, but a habit and a mindset. " - By Steven Ding

Course Description: Inductive modelling of data, especially counting models; ensemble approaches to modelling; maximum likelihood and density-based approaches to clustering, visualization. Applications to non-numeric datasets such as natural language, social networks, Internet search, recommender systems. Introduction to deep learning. Ethics of data analytics.

Degree Planning: This course is required for the Data Analytics focus of the COMP degree plan. This course is a direct prerequisite to: CISC 451/3.0 (Topics in Data Analytics)

Textbook: Lectures plus a range of library and web resources (for the main course content). Optional textbook: Data Mining: Concepts and Techniques, 3rd edition, Jiawei Han, Micheline Kamber, and Jian Pei, 2012. Zaki and Meira. Data Mining and Analysis: Fundamental Concepts and Algorithms. Cambridge University Press.

Resources: For this course, we will use exclusively Python and Google Colab for all the excercises and assignments. All the course notebooks will be posted on our GitHub page here.

Topics to be Covered (tentative)

Working with data records:

  • Intro, review of linear methods, ethical/security/privacy
  • Model tuning and experimental protocol, data preprocessing & visualization, ensemble method
  • decision tree + random forest, xgboosting, CNN, tensor-based transformation,
  • instance-based learning, bayesian method, densitiy-based clustering

Working with sequential data:

  • use cases & data preprocessing, representation, visualization
  • Time series statistical learning, Association rule mining, Aprior Algorithm & FP-Tree
  • Sequential Data Mining, NLP, RNN + attention mechanism,
  • graphical model: topic modeling, word/paragraph embedding, BERT

Working with graph data:

  • From social network to heterogeneous information network (HIN), preprocessing and visualization, use cases
  • Network-based statistical modeling, HIST & Page Rank, Recommendation System
  • Community detection, Graph embedding, DeepWalk, Metapath2vec
  • LINE, GraphSAGE, presentation

Lectures, schedules, and locations. See the full calendar events and lecture details below. All the information below is generated based on a Google calendar in real-time. If you like to use a calendar application to manage your time, you can directly subscribe through Google Calendar by clicking here, or through any other calendar application that supports the iCal link here. Please note that for some calendar applications there may be delays when synchronizing event changes. OnQ use will be minimal. Schedule subject to change as the course progresses.