Assignment 2 - CS7800 Summer 2022 Python(Solved)

Updated: Jul 24

Wright State University Department of Computer Science and Engineering

CS7800 Summer 2022 Python Assignment 2 (Solution)

In this project, you will implement several different classifiers in Python using scikit-learn APIs. The project has two phases. In Phase I, you will build classifiers and run them

on the same modified 20 Newsgroup dataset. In Phase II, you will evaluate these

classifiers to elucidate their quality of predictions (e.g., using accuracy and confusion


The project may be done in a team of two or three members like before, to promote

discussions and insights. If a fourth member is included because someone cannot find a

teammate, the team must also implement and evaluate two of the classifiers (e.g., kNN

and Rocchio) manually from scratch as required below. All the team members are

expected to contribute to all aspects of the project: design, implementation,

documentation, and testing, for their own good.

Phase I

scikit-learn provides a mature set of APIs for building models using regression,

classification and clustering techniques, and has been used extensively for prediction


Classification on Newsgroups

For this project, you will use a subset of the 20 Newsgroups dataset. The full data

set contains 20,000 newsgroup documents, partitioned (nearly) evenly across 20

different newsgroups and has been used for experiments in text applications of

machine learning techniques, such as text classification and text clustering. This

assignment dataset contains a pre-processed subset of 1000 documents and a