Airfare Prediction Model

Essay by Amrita Singh • July 20, 2016 • Coursework • 1,407 Words (6 Pages) • 2,430 Views

Essay Preview: Airfare Prediction Model

1 rating(s)

prev next

Page 1 of 6

Marketing to Frequent Fliers.

The file EastWestAirlinesCluster.xls (available on the textbook website http://dataminingbook.com/) contains information on 4000 passengers who belong to an airline’s frequent flier program. For each passenger the data include information on their mileage history and on different ways they accrued or spent miles in the last year. The goal is to try to identify clusters of passengers that have similar characteristics for the purpose of targeting different segments for different types of mileage offers.

a) Apply hierarchical clustering with Euclidean distance and Ward’s method. Make sure to

standardize the data first. How many clusters appear?

b) What would happen if the data were not standardized?

c) Compare the cluster centroids to characterize the different clusters and try to give each cluster a

label.

d) To check the stability of the clusters, remove a random 5% of the data (by taking a random

sample of 95% of the records), and repeat the analysis. Does the same picture emerge?

e) Use k-means clustering with the number of clusters that you found above in Part (a). Does the

same picture emerge ? If not, how does it contrast or validate the finding in Part c above?

f) Which cluster(s) would you target for offers, and what type of offers would you target to

customers in that cluster? Include proper reasoning in support of your choice of cluster(s) and the

corresponding offer(s).

2. Wine Data:

Step 1: Download the Wine data from the UCI machine learning repository

(http://archive.ics.uci.edu/ml/datasets/Wine)

Step 2: Do a Principal Components Analysis (PCA) on the data. Please include (copy-paste) the

relevant software outputs in your submission while answering the following questions.

a. Enumerate the insights you gathered during your PCA exercise. (Please do not clutter your

report with too MANY insignificant insights as it will dilute the value of your other significant

findings)

b. What are the social and business values of those insights, and how the value of those insights

can be harnessed?

Step 3: Do a cluster analysis using (i) all chemical measurements (ii) using two most significant PC

scores. Please include (copy-paste) the relevant software outputs in your submission while answering

the following questions.

c. Any more insights you come across during the clustering exercise?

d. Are there clearly separable clusters of wines? How many clusters did you go with? How the

clusters obtained in part (i) are different from or similar to clusters obtained in part (ii),

qualitatively?

e. Could you suggest a subset of the chemical measurements that can separate wines more

distinctly? How did you go about choosing that subset? How do the rest of the measurements

that were not included while clustering, vary across those clusters?

Question 1.

Apply hierarchical clustering with Euclidean distance and Ward’s method. Make sure to standardize the data first. How many clusters appear?

Solution.

Number of clusters: 3

Cluster I	13	16	2	17	10	14	15	18	5	20	19
Cluster II	3	12	21	1	8	9	4	16	22
Cluster III	1	23	6	11	24	25	30	27	29	28

[pic 1]

Dendrogram remains constant when cluster was 3 and when no restriction was given to it.

What would happen if the data were not standardized?

Solution: Balance, Bonus miles and Days since enrolled will take higher weights hence the result will be skewed towards those variables.

Ex- Predicted Clusters will be as follows:

[pic 2]

Balance, Bonus miles and Days since enrolled will take higher weights hence the result will be skewed towards those variables.

C) Compare the cluster centroids to characterize the different clusters and try to give each cluster a label.

Solution.

Cluster 1	Less frequent Fliers
Cluster 2	Frequent fliers
Clusters 3	Intermittent Fliers: Between cluster 1 and 3 hence Customers group for promotions

[pic 3]

To check the stability of the clusters, remove a random 5% of the data (by taking a random sample of 95% of the records), and repeat the analysis. Does the same picture emerge?

Solution.

Part A

Structure of the Dendrogram remained same but there was a change noticed in the formation of clusters.

[pic 4]

Part b

Total	3999
less 5%	3790
Removed	209
Count Where Cluster1 = cluster 2	Count Where Sub cluster 1= Sub cluster 2
1368	274
3999	0
34%	ID which remained constant after removing 5 % of data

...

Download as: txt (9.4 Kb) pdf (761.3 Kb) docx (459.7 Kb)

Continue for 5 more pages »

Read Full Essay Save

Only available on AllBestEssays.com

Similar Essays

Business Model - Panera Bread Company

Mission Panera Bread Company's chairman and CEO Ron Shaich "As we say here at Panera, it's our Product, Environment, and Great Service (PEGS) that we

1,389 Words | 6 Pages
Analysis of a Mixed Model Assembly Line

SUMMARY The following text outlines the design and analysis of a mixed model assembly line and discusses the various effects of different line layouts. This

3,357 Words | 14 Pages
External Analysis of Ryanair - Diamond Model

The Diamond Model assumes that the national home base of a firm plays a key role in shaping that firm's competitive advantage in global markets.

1,408 Words | 6 Pages
Organon Model - Karl Bühler Design

Karl Bühler design a model which we are familiar with the term Organon model. In Greek, Organon serves as a tool to inform something from

284 Words | 2 Pages
Personal Model or Theory for Helping

Human service professionals have a unique opportunity; we have the chance to help clients in many ways. The human service field is vast and includes

2,469 Words | 10 Pages
Porter's Five Forces Model

Porter's Five Forces Model The movie rental business is indeed a hotly contested and highly competitive industry. In the 80's movies the movie rental business

786 Words | 4 Pages
The Big Five Model of Personality Dimensions: Predicting Job Performance & Job Satisfaction

The Big Five Model of personality dimensions: Predicting job performance and job satisfaction Introduction The objective of this essay is to define and convey the

2,685 Words | 11 Pages
Accuracy of the Coupled Carbon Model Hadcm3lc at Predicting Future Carbon Dioxide and Temperature

Accuracy of the coupled carbon model HadCM3LC at predicting future carbon dioxide and temperature In the last 200 hundred years global atmospheric carbon dioxide levels

2,489 Words | 10 Pages

High Quality Free Essays
Join 394,000+ other members
Get Better Grades