HW_1_Stewart_Ceara

docx

School

Syracuse University *

*We aren’t endorsed by this school

Course

707

Subject

Industrial Engineering

Date

Oct 30, 2023

Type

docx

Pages

3

Report

Uploaded by EarlTankHamster31

Ceara Stewart IST 707 Applied Machine Learning HW 1 Task 1: review data mining concepts and tasks 1. Discuss whether each of the following activities is a data mining task. a. Dividing the customers of a company according to their gender i. No. The data already shows what is trying to be divided. It is also a query. It is also not a prediction. b. Dividing the customers of a company according to their profitability i. No. This is a query that may require accounting tasks. c. Computing the total sales of a company i. No. This is summarization and is an accounting task. d. Sorting a student database based on student identification numbers. i. No. This is a query. e. Predicting the outcomes of tossing a (fair) pair of dice i. No. This is a probability task. f. Predicting the future stock price of a company using historical records i. Yes. This is using historic data to identify how the future will be. It is using predictions. g. Monitoring the heart rate of a patient for abnormalities i. Yes. You are finding underlying attributes and are using anomaly detection. h. Monitoring seismic waves for earthquake activities i. Yes. This is using anomaly detection and finding earthquake patterns. i. Extracting the frequencies of a sound wave i. No. This is data processing. 2. Suppose that you are employed as a data mining consultant for an Internet search engine company. Describe how data mining can help the company by giving specific examples of how techniques, such as clustering, classification, association rule mining, and anomaly detection can be applied. a. The search engine company I chose to discuss is Google. Clustering, classification, association rule mining, and anomaly detection can be applied to help the company. For clustering, grouping similar web pages or search results based on content or user behavior can improve search result organization and provide users with more relevant and diverse results. For classification, categorizing search results into classes, such as news articles, images, videos, or products, enhances result filtering, making it easier for users to find the desired information quickly. Classification can also help sentiment analysis, spam detection, and maintaining search quality. For association rule mining, analyzing user behavior can uncover patterns and relationships between search queries, preferences, and click-through rates. This can enable personalization of search results, recommendation of related queries, and improved ad targeting. And for anomaly detection, monitoring user interactions, click patterns, and network
Ceara Stewart IST 707 Applied Machine Learning traffic helps detect click fraud, spam, security breaches, and unusual system behavior, ensuring user privacy and maintaining the integrity of the search engine. By leveraging these techniques, Google can enhance search relevance, personalize user experiences, improve ad targeting, and ensure the security and integrity of its services. 3. For each of the following data sets, explain whether data privacy is an important issue. a. Census data collected from 1900-1950 i. While the data is relatively old, privacy is still important as records are confidential for 72 years after collection. It contains sensitive personal information, such as names, addresses, and demographic details. Protecting its privacy is important to avoid potential misuse and unauthorized access. b. IP addresses and visit times of web users who visit your website. i. IP addresses can be considered personal information and can potentially be used to track and identify individuals. Privacy is important and essential for safeguarding the data so that user privacy is respected. c. Images from Earth-orbiting satellites i. This isn’t an important privacy issue. Google Maps utilizes photos that encompass public spaces and addresses. It is publicly available. But it is still important to handle the data with care and respect for the privacy choices of individuals, such as those with private property. d. Names and addresses of people from the telephone book. i. This is a moderate concern. This information is publicly available but can still be areas of concern for privacy for individuals. It is important to respect people’s privacy, such as for those who opt out of public listings. e. Names and email addresses collected from the Web. i. Data privacy is an important issue for names and email addresses collected from the Web. This information constitutes personal data and must be handled securely and in compliance with privacy regulations. Users expect their personal information to be protected, and it is essential to obtain proper consent, use secure storage methods, and provide transparent data handling practices to protect individuals’ privacy. Task 2: practice your critical thinking and writing The article in the NY Times, is a criticism of Google Flu Trends. Google Flu Trends was once celebrated as a prime example of the power of big-data analysis. But it is now facing scrutiny. Social scientists recently published an article in Science magazine, revealing that the service consistently overestimated flu cases in the United States between 2011 and 2013. Despite algorithm updates, the service still overshot predictions by approximately 30%. The authors argue that Google displayed “big data hubris” by relying solely on big data without considering traditional data collection and analysis. They suggest that combining Google Flu Trends with data from the Disease Control and Prevention yields more accurate results. The criticism
Ceara Stewart IST 707 Applied Machine Learning highlights the importance of examining the algorithms employed by private companies, which have far-reaching influence in areas such as public health. The article defending Google Flu Trends, “In Defense of Google Flue Trends”, discusses the importance of combining the data with traditional monitoring methods. This method would provide better results compared to Google Flu Trend result on their own, which is criticized for providing inaccurate results. The article gets behind the intention that Flu Trends was to supplement existing surveillance networks, not replace them. The creators of Flu Trends, Matt Mohebbi and Jeremy Ginsberg, worked closely with CDC to ensure its usefulness as a complementary tool. The article emphasizes that while technology may not live up to unrealistic expectations, when properly understood and utilized, can still offer significant value in the field of epidemiology and research. After reading both articles, there are relevant sides to both the criticism and defense of Google Flue Trends. I believe that it is important to have completed sufficient testing before the release of the results, to ensure that Flu Trends were not being overpredicted by the models. Problems arise such as data biases, lack of context, data quality and noise, and ethical and privacy concerns, that not only plague Google Flu Trend data, but future data that may be produced and put out for the public’s scrutiny. But after its release, I agree that finding a new way to present the results, such as coupling them with traditional monitoring methods to yield more accurate Flu Trends, is a sufficient way to balance the overprediction. This method of coupling methods, alongside newfound understanding of how companies can do better, creates many benefits. Such benefits include early detection and prediction of trends, cost-effective and scalability of data, discovering hidden patterns that can be implemented in prevention measures, and decision support and planning improvement of the creators. The issue around Google Flu Trends creates reminders that when engaging with big data applications, it is crucial to adopt to a critical mindset, considering the limitations, biases, and ethical consideration that is brought to light by scrutiny of the models. Transparency, thorough validation, and interdisciplinary collaborations play vital roles and need to continue to play vital roles in harnessing the potential benefits of big data applications to ensure that the issues surrounding Google Flu Trends does not occur in other big data applications.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help

Browse Popular Homework Q&A

Q: Q7. The president of Rose Bowl Enterprises, Desmond Howard, projects the firms aggregate DEMAND…
Q: Environmental Health An important issue in assessing nuclear energy is whether excess disease risks…
Q: The accompanying data set includes volumes (ounces) of a sample of cans of regular Coke. The summary…
Q: In(x) x Find the maximum and minimum values of the function f(x) = The minimum value = The maximum…
Q: In 2012 McDonald's had capital expenditures of $3,052. a. Calculate McDonald's free cash flow in…
Q: Money is invested at two rates of interest. One rate is 9% and the other is 4%. If there is $1400…
Q: zebra dolphin clam jellyfish sea urchin sea star beetle [Choose ] [Choose ] protostome coelomate…
Q: 81 81 Suppose that a₁ = 24 and b₁ = 30. Compute the sum. Σ i=1 i=1 81 Σ (6a; +23b₂) i=1
Q: Determine the number of moles of hydrogen atoms in each of the following. 9.05×10−2 mol C4H10 n1=…
Q: 13.3.00 g of pure acetic acid (CH3COOH) is diluted to 25.0 L, What is the PH of the solution? Ka =…
Q: Safe Guarding Your Devices (a) Think about how you keep your computer, tablet, or smartphone safe…
Q: The claim is that smokers have a mean cotinine level greater than the level of 2.84 ng/mL found for…
Q: What is the angle of refraction of the incident light is on air (n-1) and glass (n=1.42) boundary,…
Q: y is depreciation that has been recognized over the life of an asset shown in a contra account? Why…
Q: Calculate the pressure of a 0.003 mol of CCl4 vapor that occupies 38.8 L at 26.4 oC if the vapor is…
Q: What is the speed of light traveling from air to a medium of refractive index n=1.5? (Speed of light…
Q: 1. An individual consumer in the neoclassical theory is assumed to be 'rational,' 'isolated,' and…
Q: A license plate is to consist of two letters followed by four digits. Find the number of license…
Q: Suppose you want to build a Carnot engine with an efficiency of 90%. If the hot reservoir has a…
Q: What is the molarity of ZnCl2 that forms when 25.0 g of zinc completely reacts with CuCl2 according…
Q: The comparative balance sheets and income statements for Baird Company follow: Assets Cash Accounts…
Q: If you have 211 mL of a 3.60 M solution, what concentration (in M) will you have after adding 363 mL…