QA-123-4407

pdf

School

University of the People *

*We aren’t endorsed by this school

Course

4407

Subject

Information Systems

Date

May 8, 2024

Type

pdf

Pages

Uploaded by AgentPower13325

************************************** UNIT 1 ********************************************* ******* DATA MINING ****** Q: Data Mining can be said to be a process designed to detect patterns in data sets. A: TRUE Q: The objective of (blank) is to identify valid novel and potentially useful, and understandable correlations and patterns in existing data. A: Data Mining ****** UNSUPERVISED ***** Q: In unsupervised learning, the learning algorithm must be trained using data attributes that have been paired with an outcome variable. A: FALSE Q: Which of the following is an example of an unsupervised learning algorithm? A: K-Means Q: Unsupervised learning involves building a statistical model for predicting, or estimating an output based upon one or more inputs. A: FALSE ******* SUPERVISED ******* Q: In a supervised learning model, Bias refers to the error that is introduced from the assumptions of the data analyst. A: FALSE Q: Regression analysis involves developing a model where one or more inputs are used to predict an output variable. Regression, in this context, represents what kind of learning. A: Supervised learning ***** Machine Learning Types ****** Q: Which of the following is NOT a machine learning technique? A: Linear Components Analytics Q: A predication outcome variable must be categorical? A: FALSE

Q: Assuming that we have a data set that includes sales data for every customer over the course of several years and we wanted to use this data to predict future sales which would be the most appropriate technique to investigate? A: Regression Q: Assume that you had a variety of data including medical history, diet, heredity factors on individuals who developed cancer and you wanted to use this data to determine whether a person is likely to develop cancer. Which technique would be the most promising to start with? A: Classification

************************************** UNIT 2 ********************************************* Question: True or False: Information Retrieval or text analytics is NOT a form of data mining. Answer: FALSE **************************************** Question: NoSQL databases provide greater performance at the expense of availability. Answer: TRUE **************************************** Question: The snowflake schema differs from the star schema in that the table holding the dimensional data are normalized. Answer: TRUE **************************************** Question: Map/Reduce refers to an optimized approach to process SQL queries. Answer: FALSE **************************************** Question: Which of the following is an example of a NOSQL Analytics database? Answer: cassandra **************************************** Question: The term OLAP stands for? Answer: Online Analytical Processing **************************************** Question: What does ETL stand for? Answer: extract transform load **************************************** Question: In a data warehouse, unidimensional data is stored in a star schema format. Answer: FALSE ****************************************

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Question: Which of the following is NOT a statistical processing software package? Answer: Vertica **************************************** Question: A database where all of the values for a particular column are stored contiguously is called? Answer: Column-oriented storage ****************************************

***************** UNIT 3 ******************** Question : You are given a data set with information from 1,000 high school students (of which the following is a part of the data) and asked to build a machine learning solution that can predict the success of a student in completing a college degree. Which technique would be the best to use? Answer: classification ************************************* Question : As a new data scientist for the Moogle corporation, you are asked to develop an algorithm that can detect spam emails and deliver them into a spam folder instead of an inbox. After looking at a plot of the data it looks like the following. Which technique would you use for your algorithm? Answer: Logistic Regression

**************************************** Question :You have a dataset which produces the following plot and you need to create a predictive model. Which of the following techniques are you most likely to use? Answer: Linear Regression **************************************** Question :True or False: The following data plot represents data that is linearly separable? Answer: FALSE ****************************************

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Question : Assuming you have a linear model in which the value of m is .05 and the value of b is 10 that explains the relationship between income and credit extended. If income is 50,000, what credit will be extended? Answer: 2510 **************************************** Question : True or False: Logistics regression can be used to predict a continuous variable. Answer: FALSE **************************************** Question :The following diagram represents which technique? Answer: Curvilinear Regression ****************************************

Question : Assume that you have a data set which produces the following data plot. You wish to predict if a new case would be a ‘red’ case as opposed to a ‘blue’ case based upon the input attribute data. Which technique should you use? Answer: Logistic Regression *************************************************** Question : A regression model has a R2 statistic of .95. This indicates that the regression model is NOT a good fit and does a poor job of predicting the outcome based upon the input variables. Answer: FALSE ****************************************

QA-123-4407

Related Documents