Screen Shot 2024-06-25 at 8
.png
keyboard_arrow_up
School
Massachusetts Institute of Technology *
*We aren’t endorsed by this school
Course
CTL.
Subject
Industrial Engineering
Date
Jun 27, 2024
Type
png
Pages
1
Uploaded by MinisterFlower14402
6. Problem 3: Influence of Hyper-parameters (Written Report) [1 Bookmark this page Include your answers to all parts of problem 3 in your written report. This problem is worth 16 points. The hyper-parameter choices used in data analysis techniques can have a large impact on the inferences made. As you may have encountered, finding the best choice of parameter such as perplexity in T-SNE or the number of clusters can be an ambiguous problem. We will now investigate the sensitivity of your results to changes in these hyper-parameters, with the goal of understanding how your conclusions may vary depending on these choices. 1. (3 points) When we created the T-SNE plot in Problem 1, we ran T-SNE on the top 50 PC's of the data. But we could have easily chosen a different number of PC's to represent the data. Run T-SNE using 10, 50, 100, 250, and 500 PC's, and plot the resulting visualization for each. What do you observe as you increase the number of PC's used? 2. (13 points) Pick three hyper-parameters below (the 3 is the total number that a report needs to analyze. It can take a) 2 from A, 1 from B, or b) 1 from A, 2 from B.) and analyze how changing the hyper-parameters affect the conclusions that can be drawn from the data. Please choose at least one hyper-parameter from each of the two categories (visualization and clustering/feature selection). At minimum, evaluate the hyper- parameters individually, but you may also evaluate how joint changes in the hyper-parameters affect the results. You may use any of the datasets we have given you in this project. For visualization hyper- parameters, you may find it productive to augment your analysis with experiments on synthetic data, though we request that you use real data in at least one demonstration. Some possible choices of hyper-parameters are: Category A (visualization): e T-SNE perplexity e T-SNE learning rate o T-SNE early exaggeration e T-SNE initialization e T-SNE number of iterations/convergence tolerance Category B (clustering/feature selection): o Effect of number of PC's chosen on clustering * Type of clustering criterion used in hierarchical clustering (single linkage vs ward, for example) o Number of clusters chosen for use in unsupervised feature selection and how it affects the quality of the chosen features Magnitude of regularization and its relation to your feature selection (for example, does under or over- regularizing the model lead to bad features being selected?) e Type of regularization (Ll, L2, elastic net) in the logistic regression step and how the resulting features selected differ For visualization hyper-parameters, provide substantial visualizations and explanation on how the parameter affects the image. For clustering/feature selection, provide visualizations and/or numerical results which demonstrate how different choices affect the downstream visualizations and feature selection quality. Provide adequate explanations in words for each of these visualizations and numerical results.
Discover more documents: Sign up today!
Unlock a world of knowledge! Explore tailored content for a richer learning experience. Here's what you'll get:
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help