Data Mining for Business Analytics: Concepts, Techniques, and Applications with XLMiner
3rd Edition
ISBN: 9781118729274
Author: Galit Shmueli, Peter C. Bruce, Nitin R. Patel
Publisher: WILEY
expand_more
expand_more
format_list_bulleted
Concept explainers
Expert Solution & Answer
thumb_up100%
Want to see the full answer?
Check out a sample textbook solutionStudents have asked these similar questions
What is the best way to decide how many epochs of training to
perform?
It is always obvious looking at the decision boundary when the
model begins to overfit.
None of the others.
As soon as the value of the Testing dataset performance begins
to decrease.
As soon as the value of the Tuning dataset performance begins
to decrease.
As soon as the value of the Training dataset performance
(accuracy, F1.) begins to decrease.
As soon as the value of the Testing dataset loss begins to
increase.
As soon as the value of the Tuning dataset loss begins to
increase.
As soon as the value of the Training dataset loss begins to
increase.
Suppose you are working as a data scientist for an online retailer, and you are tasked with optimizing the pricing strategy for a new product. The product is currently priced at $50, but you want to determine whether a higher or lower price would result in greater profit. You estimate that the demand for the product can be modeled using a normal distribution with a mean of 100 units per day and a standard deviation of 20 units per day. The cost of producing each unit is $30, and the retailer's profit is the difference between the revenue and the cost. The retailer incurs a fixed cost of $1000 per day for operating expenses, regardless of the number of units sold. Use simulation to answer to the question.
You decide to run a simpler model to predict churn, using only the variables tenure (in
months) and TotalCharges (in US$). The output is given below. The AIC of this model
is 4727.6 (in contrast to the AIC of 4240 for the full model). On the basis of this which
model would be expected to give superior predictive performance?
Actual
## Coefficients:
##
Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.471e-01 5.360e-02 4.611 4.01e-06 ***
## tenure
< 2e-16 ***
-1.124e-01 5.816e-03 -19.334
## TotalCharges 8.236e-04 5.618e-05 14.660
< 2e-16 ***
##
No
---
## Signif. codes: 0
##
Yes
Yes
##
Null deviance: 5701.5 on 4921
## Residual deviance: 4721.6 on 4919
## AIC: 4727.6
515
345
## (Dispersion parameter for binomial family taken to be 1)
##
Predicted
*****
No
795
3267
0.001
Confusion Matrix (Training)
****
Actual
0.01
Yes
No
degrees of freedom
degrees of freedom
Yes
The simpler model (with just tenure and TotalCharges)
The full model (with all variables)
0.05 0.1
220
145
Predicted
No
339…
Chapter 5 Solutions
Data Mining for Business Analytics: Concepts, Techniques, and Applications with XLMiner
Knowledge Booster
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.Similar questions
- A large car fleet company asked you to help them forecast vehicle resale values. They purchase new vehicles,lease them for three years, and then sell them. Better forecasts of vehicle sales values would mean bettercontrol of profits; understanding what affects resale values may allow leasing and sales policies to bedeveloped to maximise profits. At the time, the resale values were being forecast by a group of specialists. Unfortunately, they saw any statistical model as a threat to their jobs and were uncooperative in providing information. Nevertheless, the company provided a large amount of data on previous vehicles and their eventual resale values. 1.1 Describe the five steps of forecasting in the context of this project?arrow_forwardPython Regression Model 1: train MSE = 0.423, test MSE = 0.978 Model 2: train MSE = 0.572, test MSE = 0.644 Model 3: train MSE = 0.218, test MSE = 1.103 Based on this information, which of these models generalises the best to unseen data?arrow_forwardWhat are the limitations of using the weighted scoring model?arrow_forward
- Electronic Spreadsheet Applications Compare What-If Analysis using Trial and Error and Goal Seek to the given scenario: Let's say a student is enrolled in an online class at a learning institution for a semester. His overall average grade stands at 43% in the course (Term Grade is 45%, Midterm Grade is 65%, Class Participation is 62% and Final Exam is 0%). Unfortunately, he missed his Final Exam and was given 0%. However, he has the opportunity to redo his Final Exam and needs at least an overall average of 60% to pass the course. How can you use Trial and Error and Goal Seek to find out what is the lowest grade he needs on the Final Exam to pass the class? Which method worked best for you and why?arrow_forwardIn 2020, alumni collectively donated $11.37 billion to their colleges and universities, according to an annual report from the nonprofit Council for Aid to Education. That is an increase of 14.5 percent from the prior year, which the report attributes, at least in part, to a strong stock market. Some colleges and universities have a particularly high percentage of former students who make financial contributions. In that period, the 10 institutions with the highest percentages of undergraduate alumni donors boasted an average giving rate of 51.5 percent. Alumni donor count is used to calculate alumni participation by dividing it by the number of living alumni on record. For example, if your institution has 100,000 alumni of record (i.e., living with a good address) and 20,000 of them made a gift last year, your alumni participation rate is 20%. The importance of alumni participation cannot be understated; for example, U.S. News & World Report’s Best Colleges, the most prominent…arrow_forwardSuppose, as manager of a chain of stores, you would like to use sales transactional data to analyze the effectiveness of your store's advertisements: In particular, you would like to study how specific factors influence the effectiveness of advertisements that announce a particular category of items on sale. The factors to study are: the region in which customers live, and the day-of-the-week, and time-of-the-day of the ads. Discuss how to design an efficient method to mine the transaction data sets and explain how multidimensional and multilevel mining methods can help you derive a good solution.arrow_forward
- A number of parents have volunteered their children to participate in a developmental study administered by a local child psychologist. The following chart summarizes the results of the psychologist's assessments. Developmental Test Scores 40- 35 30 25 20 15 10 1 2 4 5 6 7 8 9 10 Age (years) Which of the following statistical techniques can the psychologist use to determine the developmental score of a typical 4-year- old child despite the fact that no 4-year- old children participated in the study? O Regression O Clustering Classification O Outlier/anomaly detection Score on Developmental Testarrow_forwardTake a look at the confusion matrix below. How many values did the logistic regression model incorrectly predict as positive? Predicted Values 55 1112 98 150 No Yes Actual Values No 1112 98 Yes 55 150arrow_forwardSuppose you have to select one project partner from a set of four classmates, who have different GPAs. Assume you do not know any student’s GPA in advance but can get to know it after you have picked a student from that group (a) Suppose you pick one of the four students at random and accept that student as your project partner. What is the probability that your partner is the one with the highest GPA? (b) Suppose you decide to reject the first student and to then accept the next student if and only if that student has a higher GPA. Note that you MUST have a partner, so if the first three are rejected by you, then you have to accept the fourth student. What is the probability that your partner will be the one with the highest GPA.arrow_forward
- Draw an ER model based on following scenario.A travel agency has different bus stations at different location. Each station has a unique name, locationand buses. Each bus has a unique bus number, color and type. The bus takes routes between stations.Each route has a name, arrival time and departure time. A single bus can take routes to different stations.Some buses may not take any route in some days however a station must have a route each day. Eachbus contains seats. Each seat has unique number, row and seat type. There are three different types ofseats as first class, standard and economy. Passengers can make reservations. Passenger has id number,full name, age, gender and address. Each reservation will have a unique reservation number, fare, dateand time. Each reservation can have any numbers of seats. The reservation is dependent on passenger.arrow_forwardThe predictive performance of a model is the measure of how close the model’s prediction values are to the actual values. A close-to-ideal model would have the minimum error in the predicted and actual values. The validation set is used to assess the predictive ability of the model which has been trained using the training set. True Falsearrow_forwardYou are interested in looking at how competition affects gas prices, where you think that if a gas station has more competitors nearby, it will tend to have lower prices. To test this, you decide to use a natural experiment in which a chain of gas stations ("Thrifty") suddenly closed. This closure occurred between June and October of 2012, and meant that gas stations that were previous close to a Thrifty station experienced less competition than before. On the other hand, those stations that were not close to a Thrifty station did not experience any change in competition. You decide to use a differences in differences design, in which gas stations located near a Thrifty station at the start of 2012 are the treatment group and those that were not located near a Thrifty at the start of 2012 are the control group. You look at how prices change before and after the exit of Thrifty. This is plotted in the below graph, where the solid line represents the average price of stations that were…arrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Database System ConceptsComputer ScienceISBN:9780078022159Author:Abraham Silberschatz Professor, Henry F. Korth, S. SudarshanPublisher:McGraw-Hill EducationStarting Out with Python (4th Edition)Computer ScienceISBN:9780134444321Author:Tony GaddisPublisher:PEARSONDigital Fundamentals (11th Edition)Computer ScienceISBN:9780132737968Author:Thomas L. FloydPublisher:PEARSON
- C How to Program (8th Edition)Computer ScienceISBN:9780133976892Author:Paul J. Deitel, Harvey DeitelPublisher:PEARSONDatabase Systems: Design, Implementation, & Manag...Computer ScienceISBN:9781337627900Author:Carlos Coronel, Steven MorrisPublisher:Cengage LearningProgrammable Logic ControllersComputer ScienceISBN:9780073373843Author:Frank D. PetruzellaPublisher:McGraw-Hill Education
Database System Concepts
Computer Science
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:McGraw-Hill Education
Starting Out with Python (4th Edition)
Computer Science
ISBN:9780134444321
Author:Tony Gaddis
Publisher:PEARSON
Digital Fundamentals (11th Edition)
Computer Science
ISBN:9780132737968
Author:Thomas L. Floyd
Publisher:PEARSON
C How to Program (8th Edition)
Computer Science
ISBN:9780133976892
Author:Paul J. Deitel, Harvey Deitel
Publisher:PEARSON
Database Systems: Design, Implementation, & Manag...
Computer Science
ISBN:9781337627900
Author:Carlos Coronel, Steven Morris
Publisher:Cengage Learning
Programmable Logic Controllers
Computer Science
ISBN:9780073373843
Author:Frank D. Petruzella
Publisher:McGraw-Hill Education