Toggle Menu

ACCT648作业Data AnalysisR编程设计语言作业一对一

发布时间: 2023-04-25 00:41:54
文章来源: 考而思
摘要:
Deadline of Submission: Upload your answer file in word-format on 6 November2019 before 5pm in e-Learn, and submit the hard copy during class on that day

  Term 1, 2019/2020


  ACCT648 Applied Statistics for Data Analysis


  Assignment 3


  Deadline of Submission: Upload your answer file in word-format on 6 November


  2019 before 5pm in e-Learn, and submit the hard copy during class on that day

21.jpg


  1. The owner of a moving company typically has his most experienced manager predict the


  total number of labor hours (Hours) that will be required to complete an upcoming move.


  This approach has proved useful in the past, but the owner has the business objective


  of developing a more accurate method of predicting labor hours. In a preliminary effort


  to provide a more accurate method, the owner has decided to use the number of cubic


  feet moved (Feet), the number of pieces of large furniture (Large) and whether there is


  an elevator in the apartment building (Elevator) as the independent variables and has


  collected data for moves in which the origin and destination were within the borough


  of Manhattan in New York City and the travel time was an insignificant portion of the


  hours worked. The data are organized and stored in Moving2019.csv.


  (a) Find the multiple regression equation L1 with all the three main independent variables.


  (b) Find the multiple regression equation L2 with all the three main independent variables


  with the interaction effect of Feet and Elevator.


  (c) Find the multiple regression equation L3 with all the three main independent variables


  with the interaction effect of Large and Elevator.


  (d) Find the multiple regression equation L4 with all the three main independent variables


  with the interaction effect of Feet and Large.


  (e) When comparing all four regression models: L1, L2, L3, L4, explain why model L3


  is the best model.


  (f) Perform a residual analysis on the model L3 and determine whether the regression


  assumptions are valid.


  (g) Construct a 95% prediction interval estimate for the labor hours for moving 420


  cubic feet with 2 large furniture in an apartment building that does not have an


  elevator in model L3


  (h) Construct a 95% confidence interval estimate for the average labor hours for moving


  400 cubic feet with 3 large furniture in an apartment building that has an elevator


  in model L3


  (i) True or False: For a fixed value of cubic feet and at least one large furniture


  situations, the total number of labor hours to move in the building with elevator


  is on average less than the number of labor hours to move in the building without


  elevator under model L3. Justify your answer.


  1


  2. Based on data set given in Question (1),


  (a) Fit the multiple regression equation to predict the total number of labor hours with


  all independent variables by using the Forward Selection and BIC criterion on the


  training set. Plot the graph to show the number of variables versus BIC in each


  selection step.


  (b) Fit the multiple regression equation to predict the total number of labor hours


  with all independent variables by using the Best Subset Selection with adjusted R2


  criterion on the training set. Plot the graph to show the number of variables versus


  adjusted R2


  in each selection step.


  (c) Use the 5-fold cross-validation approach to fit the models of L1, L2, L3 and L4 and


  determine which model is the best under the criterion of their associated crossvalidation


  errors. (Note: use set.seed(1208))


  (d) Use the Leave-One-Out cross-validation approach to fit the models of L1, L2, L3 and


  L4 and determine which model is the best under the criterion of their associated


  cross-validation errors. (Note: use set.seed(5623))


  3. Suppose we collect data for a group of 130 students in a statistical class with two


  independent variables X1 = average studying hours per week, X2 = GPA, and one


  dependent variable Y = Pass (or Fail).


  We fit a logistic regression model: log(odds ratio) = β0+β1X1+β2X2 to predict whether


  a student will pass the course. R-outputs produce estimated coefficients, β?


  0 = ?9.5447,


  β?


  1 = 0.5709, and β?


  2 = 1.0682. The observations of the first five students are given as


  follows:


  Student Y X1 X2


  1 Pass 9.4 3.03


  2 Pass 14.5 3.52


  3 Pass 12.2 3.14


  4 Fail 8.4 2.76


  5 Fail 11.3 3.20


  (a) Based on the estimated logistic regression model, predict the probability that a


  student who studies 11 hours per week on average and has a GPA of 3.40 will pass


  the course.


  (b) At least how many hours would the student in part (a) need to study to have more


  than 70% predicted chance of passing the course?


  (c) Find the deviance residues of the first five observed students.


  (d) By using the estimated logistic regression model with the threshold value being


  0.55 for classification of passing the course, determine whether the model makes


  any error to predict each of the above five observed students. If there is an error,


  determine what type of error as well.


  2


  4. The stock prices of Singapore Telecommunications Limited (SingTel) with code (Z74.SI)


  and Singapore Airlines Limited (SIA) with code (C6L.SI) from 27 August 2018 to 29


  July 2019 are stored in SingTelSIA2019.csv. Suppose a portfolio investment has 8,000


  shares of SingTel at price of $3.34 per share and 5,000 shares of SIA at price of $9.42


  per share on 29 July 2019. Therefore, the portfolio investment has value of $73,820


  (8, 000 × 3.34 + 5, 000 × 9.42) on 29 July 2019.


  (a) Based on the historical approach without any assumption of distribution, calculate


  the one-day 99% VaR for this portfolio on 29 July 2019.


  (b) Without any assumption of distribution, estimate the one-day 99% VaR for this


  portfolio on 29 July 2019 based on the Bootstrap approach with 100,000 repetitions.


  (Note: use set.seed(5483))


  (c) Obtain a 95% Bootstrap percentile confidence interval for the one-day 99% VaR for


  this portfolio on 29 July 2019.


  5. The director of undergraduate studies at a college of business wants to predict whether


  students in a BBA program can graduate with a honor degree using independent variables,


  High school grade point average (GPA), SAT score, gender, and local citizen.


  Data from a random sample of 90 students, organized and stored in BBA2019.csv,


  show that 46 successfully completed the program with honor degrees (coded as Yes) and


  44 without honor degrees (coded as No) under the variable column Graduate.


  (a) Develop a logistic regression model, L1, to predict the probability of successfully


  completed the BBA program with honor degrees, based on all independent variables.


  (b) Develop the other logistic regression model, L2, to predict the probability of successfully


  completed the BBA program with honor degrees, based on the SAT, Gender,


  and Local independent variables.


  (c) Develop the other logistic regression model, L3, to predict the probability of successfully


  completed the BBA program, based on the SAT and Local independent


  variables.


  (d) Develop the other logistic regression model, L4, to predict the probability of successfully


  completed the BBA program, based on the SAT independent variables.


  (e) Explain why model L4 is the best model among the four models considered. At the


  0.05 level of significance, is there evidence that a logistic regression model L4 is a


  good fitting model?


  (f) Predict the probability of successfully completed the BBA program with honor


  degree given that a male local citizen with GPA 3.45 and SAT score 1330 under


  model L4.


  (g) Find the confusion matrix of model L4 with the threshold value 0.6 for classifying


  students successfully completed the BBA program with honor degrees.


  (h) Find the sensitivity, specificity and total error rate of the model L4 with the threshold


  value 0.6.


  -END-


  3

凡来源标注“考而思”均为考而思原创文章,版权均属考而思教育所以,任何媒体、网站或个人不得转载,否则追究法律责任。

16年深耕全阶段留学辅导   数十万留学生信赖

添加微信:「 kaoersi03 」备注官网申请试听,享专属套餐优惠!

同步课件辅导、作业补习、论文润色、真题讲解、Appeal申诉、入学内测/面试培训


添加微信【kaoersi03】(备注官网)申请试听,享专属套餐优惠!

客服微信

kaoersi03

课程听不懂?作业不会写?复习没方向?专业老师为您答疑解惑

复制成功

微信号: kaoersi03

备注“官网”享专属套餐优惠!