Pam & Sue Case Study
Essay by Woxman • November 19, 2011 • Case Study • 1,907 Words (8 Pages) • 4,570 Views
1. How would you describe the type of location sites that are likely to have higher sales?
To describe the type of location sites that are likely to have higher sales I must build a multiple regression model, utilizing the stepwise function. To complete the model, I conduct a four step process.
Step 1:
To choose which variables to consider, I first look at the correlations between the independent variables and sales. To consider the correlation between sales and the various x-variables, I not only create scatter plots to look for outliers, but I also use the Pearson's Correlation Coefficient function in excel to determine the relationship (See Appendix Table 1 & 2). For a multiple regression to be an effective tool, each variable should have a linear relationship, either positive or negative.
Step 2:
To complete the regression, I only considered those variables that are significantly correlated with sales. Immediately, I am able to eliminate those independent, or x, variables that do not have a high correlation with sales. A stepwise regression is necessary to describe the type of location sites that are likely to have higher sales, excluding the variables: %inc50-100, %inc100+, medianhome, %1car, %tvs, %sch9-11, and perhard. Additionally, I exclude competitive type as independent variable, as this variable is a dummy variable based on the categorical data of the other independent variables, and therefore, may cause multicollinearity if one of the seven types is not excluded from the study to create a baseline. To simplify the analysis, I exclude competitive type from the list of independent variable, not only due to the high Pearson's Correlation Coefficient value, but also to ensure that the dummy variable does not impact the analysis, as both reasons would cause multicollinearity (the competitive type variable will be used in question two).
Step 3:
Next, I enter the appropriate data into the Regression Worksheet, included with the book. I exclude the data corresponding to those independent variables that will cause multicollinearity. The diagram below is the end result:
Function: Sales = 11952.685 + 0.004(pop) - 55.347(%own) + 304.772(%span) + 23.313(sqrft) + 104.378(%sch12+) - 158.047(%freez) + 105.814(%sch12) - 57.095(%aircon) - 81.547(%inc20-30)
Step 4:
To check these assumptions, I must create a graph of residuals versus fitted to ensure that the forecasting errors follow a normal distribution as compared to the entire dataset.
Now that I am confident that my analysis is accurate, I am able to interpret the equation to answer the question at hand. The type of location that is likely to have higher sales are those locations with a greater square footage of store space, where the customer base is described as: high population density, a lower percentage of home ownership, a higher percentage of Spanish speaking individuals, with a high percentage of individuals with twelve years or greater in education, with a low percentage of homeowners that use freezers or air conditioners, with a low percentage of the population composed of individuals whose family's income of twenty to thirty thousand dollars annually.
2. A group within the planning department had previously developed a subjective approach in which potential sites are classified according to an assessment of the "competitive type" of the trading zone. Below in Table A, the 7 "competitive types" are defined. How good is this classification method at predicting sales? How can you quantify this? Can you improve on this method?
In leveraging the same steps I have already completed above, we plug the dataset into the Regression Worksheet to compute the below function:
Function: Sales = 19200.047 - 1939.566(comtype)
This new classification method is not as strong in predicting sales as the original equation, as demonstrated by the lower R squared value, .435 in this model compared to .643 in the original model. In other words, variation in the competitive type can explain about 43.5% of the variation in sales, while the variation in the independent variables listed in problem one can explain about 64.3% of the variation in sales. Also, the 4103.48 shows that forecasts using this regression equation generally are around $4103.48 from the actual sales figures, a greater standard error than that of the previous model. To improve the model, we must combine the original model with the competitive type model.
Function: Sales = 15031.190 - 954.171(comtype) + 0.004(populat) - 26.332(%owners)) + 252.274(%spanishsp) + 9.406(sqrft) + 72.978(%sch12+) - 131.258(%freezer) + 82.840(%sch12) - 56.876(%aircond) - 44.824(%inc20-30)
The r-squared value is 0.701, and there still isn't a problem with multicollinearity (excluding the identical variables as previously excluded), which was validated by an additional residual analysis (see question five for more details). The above model proves the assertion that by adding the comtype variable to the original model we will improve the accuracy, demonstrated by the increased R squared value from .435 to .701.
3. Two sites, A and B, are currently under consideration for the next new store opening. Characteristics of the two sites are provided below in Table B. which site would you recommend? Justify your choice and give the best sales forecasts you can. You may use the subjective classifications from Question 2 along with any other variables you think will give the best forecast. Give some estimate of the accuracy of the forecasting method you use and any other limitations of the forecasting method.
To answer the question, I must plug the applicable information into the model from problem 2:
I would select Site A, with a forecasted sales of $1,195,902.55 compared to the forecasted sales of site B, $1,140,510.88. In considering the forecasted sales figures, one should note not only the standard error inherent with the model, but also the reason for selecting the model from problem two, as opposed to the model from question one. The standard error with the model from question one is 3041.10, which demonstrates the slight inaccuracy that the forecast may provide, as the standard error is the estimation of the standard deviation of a sample. The 3041.10 shows that forecasts using this regression equation generally are around $3041.10 from the actual sales figures, and almost all are within
...
...