Ccst 9039 - Statistic Assignent
Essay by Maggie1277 • July 17, 2018 • Coursework • 1,187 Words (5 Pages) • 1,095 Views
CCST 9039 Assignment 2
Name: Zhou Shujie
UID: 3035449308
- Original data used:
https://docs.google.com/spreadsheets/d/1wVDFMTISgefI6cRxt0ARKZswzpAt0KJ680XtrD0pBR0/edit?usp=sharing
(a):
Minimum temperatures series in 2017:
Mean: 17.04576
5-number summary:
The ascending order of minimum temperatures is as follows:
13.6 14.3 14.4 14.5 14.6 14.7 15.1 15.2 15.7 15.9 15.9 16.2 16.2 16.4 16.7 16.9 17.4 18.0 18.1 18.2 18.4 18.4 18.7 18.7 18.8 18.9 18.9 19.7 19.7 19.7 20.5
The minimum: 13.6
The first quartile: 15.45
The Median: 16.9
The third quartile: 18.7
The maximum: 20.5
Minimum temperatures series in 2018:
Mean: 14.11935
5-number summary:
The ascending order of minimum temperatures is as follows:
7.8 7.9 8.9 8.9 9.5 10.5 10.5 11.3 12.0 12.1 12.6 13.0 14.0 14.8 15.6 15.7 15.9 15.9 15.9 16.0 16.1 16.3 16.6 16.8 17.1 17.2 17.2 17.4 17.9 18.1 18.2
The minimum: 7.8
The first quartile: 11.65
The median: 15.7
The third quartile: 16.7
The maximum: 18.2
(b):
2017: range = 20.5-13.6 = 6.9 interquartile range = 18.7-15.2 = 3.5
2018: range = 18.2-7.8 = 10.4 interquartile range = 16.8-11.3 = 5.5
The mean value is closest to the median among the quartiles, representing a normal distribution instead of skewed distribution. However, the range and interquartile range of 2017 are both smaller than 2018, showing the change of minimum temperatures is greater in 2018. Besides, all the numbers in (a) of 2018 are smaller than those of 2017, which means the minimum temperatures are lower in 2018 in general.
(c):
[pic 1]
[pic 2]
These two graphs tell me the rough distribution of the minimum temperatures records, visualizing the feature of its distribution. Additionally, showing the range of minimum temperatures in January. The improper class width will cause deviations in the analysis of the distribution. If the class width is too small, the frequency will be zero in some intervals, affecting the consistency of the graph and judgement. On the other hand, if the class width is too large, the intervals of the histogram will be less, leading to a more blurred picture of data distribution.
(d):
Mean values:
Pros: using mean value to find the average level of the data series is convenient and easy to understand. And every data in the records is involved in the calculation.
Cons: when there are extremely large or small numbers in the records or skewed distribution, the representativeness of means will be undesirable.
Five-number summary:
Pros: it is also easy to use and understand. Moreover, it can roughly show the distribution by locate the median, maximum, minimum, the first quartile and the third quartile.
Cons: it only depends on certain numbers in a series, other changes in data, for example, the second number in 2017 and 2018 separately change to 14.4 and 8.0, will not be showed using the five-number summary, limiting its representativeness.
Histogram:
Pros: visualizing the distribution and difference in frequency in data series. And it can be used under many situations.
Cons: the judgement can be influenced by the choice of class width, increasing the difficulty to use compared to methods above. Besides, some of the information may be excluded in a histogram since the histogram can not show the exact number in a data series. It only involve frequency in each interval.
(e):
These time plots mainly display not only the exact numbers but also consistent changes in minimum temperatures in January in both years. We can analysis and compare the degree of volatility of data from the time plots. Additionally, these time plots tell us the changing trend of data.
[pic 3]
[pic 4]
(f):
22nd in 2018:
When n = 1 Prediction = 16.8
Prediction error = 17.1-16.8 = 0.3
When n = 5 Prediction = (15.9+15.9+17.9+17.4+16.8)/5 = 16.78
Prediction error = 17.1-16.78 = 0.32
Using the same method, we get:
When n = 1, prediction errors are:
(22nd ): 0.3 (23rd): 0.1 (24th): 0.6 (25th): 0.5 (26th): 0.5 (27th): 1.6 (28th): 1.4 (29th): 3.7 (30th): 0 (31st): 1.1
Mean = (0.3+0.1+0.6+0.5+0.5+1.6+1.4+3.7+0+1.1)/10 = 0.98
When n = 5, prediction errors are:
(22nd): 0.32 (23rd): 0.18 (24th): 0.68 (25th): 0.92 (26th): 1.16 (27th): 2.52 (28th): 3.3 (29th): 6.08 (30th): 4.54 (31st): 4.2
Mean = (0.32+0.18+0.68+0.92+1.16+2.52+3.3+6.08+4.54+4.2)/10 = 2.39
n=1 gives me the most accurate prediction on average.
- Original data used:
https://docs.google.com/spreadsheets/d/1M4Yr1PqWX5LnxgcDoBbQU1CtAIJ1tup4Fw6Ww0nrGZA/edit?usp=sharing
(a):
0011 Heng Seng Bank Ltd. (Finance)
Date | closing price | t-th Day’s return Rth |
2-Feb-18 | 184.1 |
|
1-Feb-18 | 185.4 | 0.007061 |
31-Jan-18 | 186.2 | 0.004315 |
30-Jan-18 | 187.6 | 0.007519 |
29-Jan-18 | 187.2 | -0.00213 |
26-Jan-18 | 186 | -0.00641 |
25-Jan-18 | 186.5 | 0.002688 |
24-Jan-18 | 186.9 | 0.002145 |
23-Jan-18 | 188 | 0.005886 |
22-Jan-18 | 188.9 | 0.004787 |
19-Jan-18 | 189.5 | 0.003176 |
18-Jan-18 | 191 | 0.007916 |
17-Jan-18 | 190.8 | -0.00105 |
16-Jan-18 | 189.4 | -0.00734 |
15-Jan-18 | 190.6 | 0.006336 |
12-Jan-18 | 189.2 | -0.00735 |
11-Jan-18 | 190.5 | 0.006871 |
10-Jan-18 | 192.1 | 0.008399 |
9-Jan-18 | 191.8 | -0.00156 |
8-Jan-18 | 192.1 | 0.001564 |
5-Jan-18 | 194 | 0.009891 |
4-Jan-18 | 193.6 | -0.00206 |
3-Jan-18 | 194.1 | 0.002583 |
2-Jan-18 | 194.5 | 0.002061 |
29-Dec-17 | 194 | -0.00257 |
28-Dec-17 | 194.2 | 0.001031 |
27-Dec-17 | 192.5 | -0.00875 |
22-Dec-17 | 191.8 | -0.00364 |
21-Dec-17 | 190.8 | -0.00521 |
20-Dec-17 | 191 | 0.001048 |
19-Dec-17 | 188.7 | -0.01204 |
18-Dec-17 | 187.6 | -0.00583 |
15-Dec-17 | 185.7 | -0.01013 |
14-Dec-17 | 188.3 | 0.014001 |
13-Dec-17 | 190.4 | 0.011152 |
12-Dec-17 | 185.7 | -0.02468 |
11-Dec-17 | 187.7 | 0.01077 |
8-Dec-17 | 185.9 | -0.00959 |
7-Dec-17 | 188.1 | 0.011834 |
6-Dec-17 | 187.8 | -0.00159 |
5-Dec-17 | 188.1 | 0.001597 |
4-Dec-17 | 190.3 | 0.011696 |
...
...