Introduction to Machine Learning and Data Mining

Semester
Spring
Year offered
2023

 

PSYC 5710: Introduction to Data Mining and Machine Learning

Spring 2023

Instructor:       Hudson F. Golino, Ph.D. 

Email:             hfg9s@virginia.edu

Classroom: This course will meet in Gilmer 250.

Instructional Mode:

 

Our course is IN PERSON. All essential information concerning our class will be made via general announcements on our Canvas website and you’ll receive them in your e-mail. 

 

Course Description:

 

In 2012, Thomas Davenport and D.J. Patil pointed out that data scientists have the sexiest job of the 21st century in their Harvard Business Review article. Data scientists are usually on the front of technical and scientific development. They usually can extract incredibly useful information from all sorts of data, generate new discoveries, and develop interesting new technologies. As pointed out by Davenport and Patil in their 2012’s HBR article, “data scientists want to be in the thick of a developing situation.”

 

Machine learning and data mining are two interrelated fields that are a vivid part of a data scientist’s activities. The first is a relatively new scientific field composed of a broad class of competing computational and statistical methods used to accomplish several tasks (such as prediction, pattern recognition, and so on). Data mining, in its turn, is a term used to designate the discovery of useful knowledge from data. In psychology, machine learning and data mining techniques have been applied to diagnose ADHD, predict educational outcomes, understand how the brain works, assess the link between multiple mental disorders, and develop inclusive technologies, among other exciting applications. 

 

You may have read several news articles praising the field of data science, and you certainly took advantage of its development daily even if you don’t know (you have a smartphone with you right now, and you certainly use Google as a search engine). The present course will be an introductory, hands-on course, covering several basic techniques and methods used in machine learning and data mining. By taking this course, please achieve several relevant goals for your professional and academic life. I want to make a long-lasting impact on your life.

 

Goals:

 

I want to help you appreciate how machine learning and data mining procedures impact the science of psychology, transform unstructured text data into analyzable datasets, extract useful information, and generate insight and knowledge from it. It is also a goal of the current course to help you generate knowledge and insight using structured data and develop the practice of reflecting on how to improve the prediction of a given variable. In five years from now, I want you to continue to learn machine learning and data mining methods and procedures on your own and to communicate the results of a machine learning model or a data mining procedure in an efficient way to both experts and non-experts. In this course, you will help your peers understand the techniques you learned. You will also efficiently work in groups, respecting the opinion of others, knowing there are multiple ways to accomplish each task, and using “the wisdom of the crowd” to solve your data analysis issues.

By the end of the semester, you will be able to:

 

  1. Pre-process text data to generate analyzable datasets;

 

  1. Use both unstructured text data and structured data to predict given outcomes, using non-parametric, non-linear techniques. 

 

  1. Evaluate the accuracy or efficiency of a given prediction, both for regression or classification tasks, while controlling overfitting and variance. 

 

  1. Be able to identify groups of variables to investigate the underlying structure of multivariate data via Exploratory Graph Analysis;

 

  1. Know which models to use, depending on the nature of the data, the goals of the analysis, and the tasks to be accomplished;

 

  1. Help your peers to understand the techniques you learned and efficiently work in groups, respecting the opinion of others, knowing there are multiple ways to accomplish each task, and using “the wisdom of the crowd” to solve your data analysis issues;

 

 

Other general goals about Integration and Active Learning:

 

Integrate all the techniques to extract useful information and knowledge from data;

 

Know how to answer your questions using available resources (books, papers, online forums, etc.);

 

Be able to learn actively.

 

 

 

Schedule:

Class Number

Date

Topic

Activity

1

18-Jan

Introduction and Welcome

 

2

25-Jan

Text Mining

 

3

1-Feb

Mining Twitter/Sentiment Analysis

 

4

8-Feb

Practice Week (Text Mining and Sentiment Analysis) (Online – No In person class)

 

5

15-Feb

Intro to Linear Algebra

 

6

22-Feb

Networks in Psychology

 

7

1-Mar

Exploratory Graph Analysis

 (Online – No In person class)

Hands-On Project 1 - Starts

 

3/4-3/12

SPRING BREAK

 

8

15-Mar

Dynamic Exploratory Graph Analysis

 

 

22-Mar

Linear Regression 

(Online – No In person class)

Kaggle Competition Starts 

9

29-Mar

Multivariate Adaptive Regression Splines

Hands-On Project 1 - Ends

10

5-Apr

Resampling Methods/MARS

 

11

12-Apr

Tree-Based Models

 

12

19-Apr

Kaggle Week (1)

 

13

26-Apr

Kaggle Week (2)

Kaggle Competition Ends