Updated Professional Data Engineer Dumps

.pdf

School

Canada College *

*We aren’t endorsed by this school

Course

CERTQUEEN

Subject

Information Systems

Date

May 12, 2024

Type

pdf

Pages

Uploaded by SargentRock14116 on coursehero.com

Professional Data Engineer Exam Name: Google Certified Professional – Data Engineer Full version: 331 Q&As Full version of Professional Data Engineer Dumps Share some Professional Data Engineer exam dumps below.

1. You work for a car manufacturer and have set up a data pipeline using Google Cloud Pub/Sub to capture anomalous sensor events. You are using a push subscription in Cloud Pub/Sub that calls a custom HTTPS endpoint that you have created to take action of these anomalous events as they occur. Your custom HTTPS endpoint keeps getting an inordinate amount of duplicate messages. What is the most likely cause of these duplicate messages? A. The message body for the sensor event is too large. B. Your custom endpoint has an out-of-date SSL certificate. C. The Cloud Pub/Sub topic has too many messages published to it. D. Your custom endpoint is not acknowledging messages within the acknowledgement deadline. Answer: B 2. You are developing an application that uses a recommendation engine on Google Cloud. Your solution should display new videos to customers based on past views. Your solution needs to generate labels for the entities in videos that the customer has viewed. Your design must be able to provide very fast filtering suggestions based on data from other customer preferences on several TB of data. What should you do? A. Build and train a complex classification model with Spark MLlib to generate labels and filter the results. Deploy the models using Cloud Dataproc. Call the model from your application. B. Build and train a classification model with Spark MLlib to generate labels. Build and train a second classification model with Spark MLlib to filter results to match customer preferences. Deploy the models using Cloud Dataproc. Call the models from your application. C. Build an application that calls the Cloud Video Intelligence API to generate labels. Store data in Cloud Bigtable, and filter the predicted labels to match the user’s viewing history to generate preferences. D. Build an application that calls the Cloud Video Intelligence API to generate labels. Store data in Cloud SQL, and join and filter the predicted labels to match the user’s viewing history to generate preferences.

Answer: C 3. You have a data stored in BigQuery. The data in the BigQuery dataset must be highly available. You need to define a storage, backup, and recovery strategy of this data that minimizes cost. How should you configure the BigQuery table? A. Set the BigQuery dataset to be regional. In the event of an emergency, use a point-in-time snapshot to recover the data. B. Set the BigQuery dataset to be regional. Create a scheduled query to make copies of the data to tables suffixed with the time of the backup. In the event of an emergency, use the backup copy of the table. C. Set the BigQuery dataset to be multi-regional. In the event of an emergency, use a point-in- time snapshot to recover the data. D. Set the BigQuery dataset to be multi-regional. Create a scheduled query to make copies of the data to tables suffixed with the time of the backup. In the event of an emergency, use the backup copy of the table. Answer: B 4. Your company is implementing a data warehouse using BigQuery, and you have been tasked with designing the data model You move your on-premises sales data warehouse with a star data schema to BigQuery but notice performance issues when querying the data of the past 30 days. Based on Google's recommended practices, what should you do to speed up the query without increasing storage costs? A. Denormalize the data B. Shard the data by customer ID C. Materialize the dimensional data in views D. Partition the data by transaction date Answer: C 5. You are part of a healthcare organization where data is organized and managed by respective data owners in various storage services. As a result of this decentralized ecosystem, discovering and managing data has become difficult. You need to quickly identify and implement a cost-optimized solution to assist your organization with the following • Data management and discovery

• Data lineage tracking • Data quality validation How should you build the solution? A. Use BigLake to convert the current solution into a data lake architecture. B. Build a new data discovery tool on Google Kubernetes Engine that helps with new source onboarding and data lineage tracking. C. Use BigOuery to track data lineage, and use Dataprep to manage data and perform data quality validation. D. Use Dataplex to manage data, track data lineage, and perform data quality validation. Answer: D Explanation: Dataplex is a Google Cloud service that provides a unified data fabric for data lakes and data warehouses. It enables data governance, management, and discovery across multiple data domains, zones, and assets. Dataplex also supports data lineage tracking, which shows the origin and transformation of data over time. Dataplex also integrates with Dataprep, a data preparation and quality tool that allows users to clean, enrich, and transform data using a visual interface. Dataprep can also monitor data quality and detect anomalies using machine learning. Therefore, Dataplex is the most suitable solution for the given scenario, as it meets all the requirements of data management and discovery, data lineage tracking, and data quality validation. Reference: Dataplex overview Automate data governance, extend your data fabric with Dataplex-BigLake integration Dataprep documentation 6. You set up a streaming data insert into a Redis cluster via a Kafka cluster. Both clusters are running on Compute Engine instances. You need to encrypt data at rest with encryption keys that you can create, rotate, and destroy as needed. What should you do? A. Create a dedicated service account, and use encryption at rest to reference your data stored in your Compute Engine cluster instances as part of your API service calls. B. Create encryption keys in Cloud Key Management Service. Use those keys to encrypt your data in all of the Compute Engine cluster instances. C. Create encryption keys locally. Upload your encryption keys to Cloud Key Management Service. Use those keys to encrypt your data in all of the Compute Engine cluster instances.

D. Create encryption keys in Cloud Key Management Service. Reference those keys in your API service calls when accessing the data in your Compute Engine cluster instances. Answer: C 7. The Dataflow SDKs have been recently transitioned into which Apache service? A. Apache Spark B. Apache Hadoop C. Apache Kafka D. Apache Beam Answer: D Explanation: Dataflow SDKs are being transitioned to Apache Beam, as per the latest Google directive Reference: https://cloud.google.com/dataflow/docs/ 8. You need to migrate a Redis database from an on-premises data center to a Memorystore for Redis instance. You want to follow Google-recommended practices and perform the migration for minimal cost. time, and effort. What should you do? A. Make a secondary instance of the Redis database on a Compute Engine instance, and then perform a live cutover. B. Write a shell script to migrate the Redis data, and create a new Memorystore for Redis instance. C. Create a Dataflow job to road the Redis database from the on-premises data center. and write the data to a Memorystore for Redis instance D. Make an RDB backup of the Redis database, use the gsutil utility to copy the RDB file into a Cloud Storage bucket, and then import the RDB tile into the Memorystore for Redis instance. Answer: D Explanation: The import and export feature uses the native RDB snapshot feature of Redis to import data into or export data out of a Memorystore for Redis instance. The use of the native RDB format prevents lock-in and makes it very easy to move data within Google Cloud or outside of Google Cloud. Import and export uses Cloud Storage buckets to store RDB files. Reference: https://cloud.google.com/memorystore/docs/redis/import-export-overview 9. Flowlogistic’s CEO wants to gain rapid insight into their customer base so his sales team can be better informed in the field. This team is not very technical, so they’ve purchased a

visualization tool to simplify the creation of BigQuery reports. However, they’ve been overwhelmed by all the data in the table, and are spending a lot of money on queries trying to find the data they need. You want to solve their problem in the most cost-effective way. What should you do? A. Export the data into a Google Sheet for virtualization. B. Create an additional table with only the necessary columns. C. Create a view on the table to present to the virtualization tool. D. Create identity and access management (IAM) roles on the appropriate columns, so only they appear in a query. Answer: C 10. You have a data pipeline with a Dataflow job that aggregates and writes time series metrics to Bigtable. You notice that data is slow to update in Bigtable. This data feeds a dashboard used by thousands of users across the organization. You need to support additional concurrent users and reduce the amount of time required to write the data. What should you do? Choose 2 answers A. Configure your Dataflow pipeline to use local execution. B. Modify your Dataflow pipeline lo use the Flatten transform before writing to Bigtable. C. Modify your Dataflow pipeline to use the CoGrcupByKey transform before writing to Bigtable. D. Increase the maximum number of Dataflow workers by setting maxNumWorkers in PipelineOptions. E. Increase the number of nodes in the Bigtable cluster. Answer: D, E Explanation: https://cloud.google.com/bigtable/docs/performance#performance-write-throughput https://cloud.google.com/dataflow/docs/reference/pipeline-options 11. Your startup has a web application that currently serves customers out of a single region in Asi a. You are targeting funding that will allow your startup lo serve customers globally. Your current goal is to optimize for cost, and your post-funding goat is to optimize for global presence and performance. You must use a native JDBC driver. What should you do? A. Use Cloud Spanner to configure a single region instance initially. and then configure multi- region C oud Spanner instances after securing funding. B. Use a Cloud SQL for PostgreSQL highly available instance first, and 8»gtable with US.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help