Implementing Weka as a Platform for Data Analysis Using Machine Learning Algorithms

Essay by era kajal • November 17, 2018 • Research Paper • 3,151 Words (13 Pages) • 1,326 Views

Essay Preview: Implementing Weka as a Platform for Data Analysis Using Machine Learning Algorithms

prev next

Report this essay

Page 1 of 13

IMPLEMENTING WEKA AS A PLATFORM FOR DATA ANALYSIS USING MACHINE LEARNING ALGORITHMS

Research Project

submitted

in partial fulfillment

for the award of the Degree of

Master of Science

Department of Information Systems

[pic 1]

Professor Submitted By:

Prof. Song Xing Era Singh Kajal

Assistant Professor IS/IT Architectures (CIS 5100)

Department of Information Systems

College Of Business and Economics

California State University, Los Angeles

Dec-2017

Implementing WEKA as a platform for Data Analysis

using Machine Learning Algorithms

Abstract

In this paper, machine learning algorithms and artificial neural networks from illustrations in the breast cancer data set are applied by Weka as a platform. The data set were donated in 1988. This is one of the areas provided by UCI Machine Learning Repository that has continuously appeared in the machine learning literature. Various algorithms and their corresponding results were compared with the calculations received on Weka.

KEYWORDS: machine learning algorithms, artificial neural networks, breast cancer data set, Weka, classification.

Introduction

Machine learning is a part of artificial intelligence (AI) which allows computer system to learn by using algorithms without any human interference and helps to optimize its performance using sample data sets or past experience [1]. In this paper, I will use classification and analysis processes on the breast cancer dataset. Naïve Bayes classifier, K Star (Instance based classifier), decision tree, SVM (support vector machine), artificial neural networks (ANNs) have been used in order to figure out the results.

Artificial neural networks (ANNs) is a nonlinear statistical data modeling tool where the complicated relationships between inputs and outputs are modeled or patterns are found. Algorithms like backpropagation use gradient descent to adjust network parameters to be best fitted. ANN learning reduces the errors in the training data and has been successfully tested on different problems, for instance, interpreting visual demonstrations [2]. It is basically used in real life examples with proven success. In this paper, machine learning algorithms mentioned above and ANNs used on fetching the medical data from breast cancer patients.

Machine Learning Algorithms

Data mining is basically the process of analyzing large datasets and the extracting the proper meaning of data. It helps in the proper prediction of future trends and patterns, allowing the business in decision making. The major objective of the data mining process is to fetch the data from data set, and convert into more meaningful form with the help of different algorithms such as in breast cancer data set. In this paper, machine learning algorithms developed for data mining process are used. These five algorithms are determined to analyze the results. The algorithms are Naïve Bayes classifier, SVM (support vector machine), artificial neural networks (ANNs), decision tree and K Star (Instance-based classifier). The WEKA tool helps to retrieve the breast cancer data [3] for running the algorithms. Thus, we can easily understand the difference between the algorithm results. To sum up, the perfect and good classification on the breast cancer data set is understandable.

a. Naïve Bayes Classifier: Naïve Bayes is a statistical learning algorithm that implements a simplified interpretation of Bayes rule in order to compute the posterior probability of a category given the input attribute values of an example situation. Prior probabilities for categories and attribute values conditioned on categories are evaluated from frequency counts computed from the given training data. Naïve Bayes is a simplified and quick learning algorithm that often exceeds more sophisticated methods. In figure 1, building of Naïve Bayes Classifier is shown. The Bayesian classification produces a supervised learning method as well as a statistical approach for classification. Assumes a necessary probabilistic model and it grants us to capture uncertainty regarding the model in an elevated way by deciding probabilities of the outcomes. It provides solutions for both diagnostic as well as predictive problems.

[pic 2]

Figure 1. Building a Naïve Bayes Classifier

b. SVM (Support Vector Machine): SVM are supervised learning models which applies John C. Platt's sequential minimal optimization algorithm in order to train a support vector classifier using polynomial or RBF kernels. This implementation globally substitutes all the missing values and transforms nominal attributes to the binary ones. By default, it also normalizes all attributes. In that case the coefficients in the results are based on the normalized data, not the original data, this is an important factor for interpreting the classifier.

c. Artificial Neural Networks (ANNs) Classifier In this paper, multilayer perceptron networks (MLPs) classifier developed for data mining process is used. MLP is a class of feedforward artificial neural network that consists of at least three layers of nodes. The type of multilayer networks learned by the backpropagation algorithm are competent of expressing an affluent variety of nonlinear decision surfaces. To illustrate, a typical multilayer network and decision surface is depicted in the Figure 2. In a feedforward network information constantly moves in one direction; it never goes backward.

...

Download as: txt (43.9 Kb) pdf (1 Mb) docx (475.3 Kb)

Continue for 12 more pages »

Read Full Essay Save

Only available on AllBestEssays.com

Similar Essays

Data Analysis Mba

1. Kate Denney was the Executive Vice President of Human Resources at a major Canadian financial institution. She had recently received a complaint from the

2,301 Words | 10 Pages
Just Deserts Finale Data Analysis

MEMORANDUM To: Paul Conforti, CEO Finale Desserterie & Bakery From: Felicity Klass, Market Research Analyst Date: April 27, 2012 Subject: Finale Data Analysis Dear Mr.

909 Words | 4 Pages
Engineering Statistics Data Analysis

Engineering Statistics Homework Fall 2015 Submit a written report with the answers for the following problem. Use a wordprocessor for the report and Minitab for

7,966 Words | 32 Pages
Statistical Tools and Data Analysis

Statistical Tools and Data Analysis Brett Zurbrick QSO-510 02-06-2016 ________________ To understand the needs of A-Cat Corporation with regard to the required amount of transformers

665 Words | 3 Pages
Decision Trees - Machine Learning

Marketing Model for Brand Advocacy Customer Loyalty Assessment based on Demographics Model Summary Decision Trees – CHAID Approach Response & Predictors Specifications Growing Method CHAID

2,211 Words | 9 Pages
Global Shark Attacks Data Analysis

Global Shark Attacks Data Analysis An analysis of shark attacks reported worldwide from 1854-2015 Introduction: A shark attack is an attack on a human by

5,455 Words | 22 Pages
Lending Club Data Analysis

Lending Club Data Analysis - Dr. Soper ISDS 577: Project Phase – 1 Presented by: Ashutosh Lall Gurleen Kaur Tanisha Munshi Wenjie Li Contents Introduction

2,623 Words | 11 Pages
How to Make Your Company Machine Learning Ready

How to Make Your Company Machine Learning Ready By AI and machine learning, we mean machines have an ability to fulfill the objectives, but they

289 Words | 2 Pages