Implementing Weka as a Platform for Data Analysis Using Machine Learning Algorithms
Essay by era kajal • November 17, 2018 • Research Paper • 3,151 Words (13 Pages) • 1,118 Views
Essay Preview: Implementing Weka as a Platform for Data Analysis Using Machine Learning Algorithms
IMPLEMENTING WEKA AS A PLATFORM FOR DATA ANALYSIS USING MACHINE LEARNING ALGORITHMS
A
Research Project
submitted
in partial fulfillment
for the award of the Degree of
Master of Science
in
Department of Information Systems
[pic 1]
Professor Submitted By:
Prof. Song Xing Era Singh Kajal
Assistant Professor IS/IT Architectures (CIS 5100)
Department of Information Systems
College Of Business and Economics
California State University, Los Angeles
Dec-2017
Implementing WEKA as a platform for Data Analysis
using Machine Learning Algorithms
Abstract
In this paper, machine learning algorithms and artificial neural networks from illustrations in the breast cancer data set are applied by Weka as a platform. The data set were donated in 1988. This is one of the areas provided by UCI Machine Learning Repository that has continuously appeared in the machine learning literature. Various algorithms and their corresponding results were compared with the calculations received on Weka.
KEYWORDS: machine learning algorithms, artificial neural networks, breast cancer data set, Weka, classification.
Introduction
Machine learning is a part of artificial intelligence (AI) which allows computer system to learn by using algorithms without any human interference and helps to optimize its performance using sample data sets or past experience [1]. In this paper, I will use classification and analysis processes on the breast cancer dataset. Naïve Bayes classifier, K Star (Instance based classifier), decision tree, SVM (support vector machine), artificial neural networks (ANNs) have been used in order to figure out the results.
Artificial neural networks (ANNs) is a nonlinear statistical data modeling tool where the complicated relationships between inputs and outputs are modeled or patterns are found. Algorithms like backpropagation use gradient descent to adjust network parameters to be best fitted. ANN learning reduces the errors in the training data and has been successfully tested on different problems, for instance, interpreting visual demonstrations [2]. It is basically used in real life examples with proven success. In this paper, machine learning algorithms mentioned above and ANNs used on fetching the medical data from breast cancer patients.
Machine Learning Algorithms
Data mining is basically the process of analyzing large datasets and the extracting the proper meaning of data. It helps in the proper prediction of future trends and patterns, allowing the business in decision making. The major objective of the data mining process is to fetch the data from data set, and convert into more meaningful form with the help of different algorithms such as in breast cancer data set. In this paper, machine learning algorithms developed for data mining process are used. These five algorithms are determined to analyze the results. The algorithms are Naïve Bayes classifier, SVM (support vector machine), artificial neural networks (ANNs), decision tree and K Star (Instance-based classifier). The WEKA tool helps to retrieve the breast cancer data [3] for running the algorithms. Thus, we can easily understand the difference between the algorithm results. To sum up, the perfect and good classification on the breast cancer data set is understandable.
a. Naïve Bayes Classifier: Naïve Bayes is a statistical learning algorithm that implements a simplified interpretation of Bayes rule in order to compute the posterior probability of a category given the input attribute values of an example situation. Prior probabilities for categories and attribute values conditioned on categories are evaluated from frequency counts computed from the given training data. Naïve Bayes is a simplified and quick learning algorithm that often exceeds more sophisticated methods. In figure 1, building of Naïve Bayes Classifier is shown. The Bayesian classification produces a supervised learning method as well as a statistical approach for classification. Assumes a necessary probabilistic model and it grants us to capture uncertainty regarding the model in an elevated way by deciding probabilities of the outcomes. It provides solutions for both diagnostic as well as predictive problems.
[pic 2]
Figure 1. Building a Naïve Bayes Classifier
b. SVM (Support Vector Machine): SVM are supervised learning models which applies John C. Platt's sequential minimal optimization algorithm in order to train a support vector classifier using polynomial or RBF kernels. This implementation globally substitutes all the missing values and transforms nominal attributes to the binary ones. By default, it also normalizes all attributes. In that case the coefficients in the results are based on the normalized data, not the original data, this is an important factor for interpreting the classifier.
c. Artificial Neural Networks (ANNs) Classifier In this paper, multilayer perceptron networks (MLPs) classifier developed for data mining process is used. MLP is a class of feedforward artificial neural network that consists of at least three layers of nodes. The type of multilayer networks learned by the backpropagation algorithm are competent of expressing an affluent variety of nonlinear decision surfaces. To illustrate, a typical multilayer network and decision surface is depicted in the Figure 2. In a feedforward network information constantly moves in one direction; it never goes backward.
...
...