The Introduction to Machine Learning for Data Science seminar was created as part of the NSF-funded Data Science for All project with the goal of producing seminars that students from across a spectrum of majors (upper and lower division) could take to get introduced to data science. We have also presented this multiple times at the community college level. In this seminar, we introduce students to data science using Python. Students load Python programs in Jupyter notebooks in Google Colaboratory (requires a gmail account).
By attending this seminar, students will
- Learn and apply fundamental machine learning concepts to solve real-world problems in Data Science
- Apply ML tools and various Python libraries to get hands-on programming experience
- Understand the skills necessary and step by step process for machine learning
- Understand various machine learning techniques and their applications, analyze regression, and classification problems
- Learn supervised learning, linear regression analysis using Python libraries - Numpy, Pandas, Sklearn
- Learn unsupervised learning, solving classification problems using K-Means clustering in Python
- Analyze the accuracy of machine learning models using commonly used loss functions
Although this technology is new to most of the students, we use an interactive approach that's hands-on so we don't lose them. The materials are designed as a stand-alone 3-hour seminar, but the materials could be broken up and used as a module in a course. The link provided here is to our project's website at San Jose State University (where the faculty involved in the project teach in MIS and AIS).
The above link will take you to the webpage for this particular seminar, but currently there are 8 seminars created and all provide materials under the same Creative Commons License. The webpage provides links to some of the material including a PDF of the slides, the datasets created for the seminar, the notebook used, and some additional materials.
We also make available additional teaching materials, all of the materials bundled in a Canvas cartridge, the PowerPoint slides in case you want to edit them, and a notebook and test materials we use for creating digital badges for participants. The additional materials and test questions will be provided to anyone with a faculty email address and webpage (it can even be a page that just lists you with your email as the instructor at an actual university or college). The instructions for requesting the additional materials will always be at this address on the Data Science for All website.
All of these materials are available to any faculty member to use or modify under the CC license, we just don't put them directly on the website in case anyone is using the test questions (yes, we know, they will end up on the web, but we try our best not to disclose the materials to students in case any faculty are using the questions on a test or quiz, and ask you to also).
Development of the Data Science for All Seminar Series is funded under NSF grant #1829622