Updated Professional Data Engineer Dumps
.pdf
keyboard_arrow_up
School
Canada College *
*We aren’t endorsed by this school
Course
CERTQUEEN
Subject
Information Systems
Date
May 12, 2024
Type
Pages
26
Uploaded by SargentRock14116 on coursehero.com
Professional Data
Engineer
Exam Name: Google Certified Professional – Data
Engineer
Full version:
331 Q&As
Full version of Professional Data Engineer
Dumps
Share some Professional Data Engineer exam
dumps below.
1. You work for a car manufacturer and have set up a data pipeline using Google Cloud
Pub/Sub to
capture anomalous sensor events. You are using a push subscription in Cloud Pub/Sub that
calls a custom HTTPS endpoint that you have created to take action of these anomalous events
as they occur. Your custom HTTPS endpoint keeps getting an inordinate amount of duplicate
messages.
What is the most likely cause of these duplicate messages?
A. The message body for the sensor event is too large.
B. Your custom endpoint has an out-of-date SSL certificate.
C. The Cloud Pub/Sub topic has too many messages published to it.
D. Your custom endpoint is not acknowledging messages within the acknowledgement
deadline.
Answer:
B
2. You are developing an application that uses a recommendation engine on Google Cloud.
Your solution should display new videos to customers based on past views. Your solution needs
to generate labels for the entities in videos that the customer has viewed. Your design must be
able to provide very fast filtering suggestions based on data from other customer preferences
on several TB of data.
What should you do?
A. Build and train a complex classification model with Spark MLlib to generate labels and filter
the results.
Deploy the models using Cloud Dataproc. Call the model from your application.
B. Build and train a classification model with Spark MLlib to generate labels. Build and train a
second
classification model with Spark MLlib to filter results to match customer preferences. Deploy the
models
using Cloud Dataproc. Call the models from your application.
C. Build an application that calls the Cloud Video Intelligence API to generate labels. Store data
in Cloud
Bigtable, and filter the predicted labels to match the user’s viewing history to generate
preferences.
D. Build an application that calls the Cloud Video Intelligence API to generate labels. Store data
in Cloud
SQL, and join and filter the predicted labels to match the user’s viewing history to generate
preferences.
Answer:
C
3. You have a data stored in BigQuery. The data in the BigQuery dataset must be highly
available. You need to define a storage, backup, and recovery strategy of this data that
minimizes cost.
How should you configure the BigQuery table?
A. Set the BigQuery dataset to be regional. In the event of an emergency, use a point-in-time
snapshot to recover the data.
B. Set the BigQuery dataset to be regional. Create a scheduled query to make copies of the
data to tables suffixed with the time of the backup. In the event of an emergency, use the
backup copy of the table.
C. Set the BigQuery dataset to be multi-regional. In the event of an emergency, use a point-in-
time snapshot to recover the data.
D. Set the BigQuery dataset to be multi-regional. Create a scheduled query to make copies of
the data to tables suffixed with the time of the backup. In the event of an emergency, use the
backup copy of the table.
Answer:
B
4. Your company is implementing a data warehouse using BigQuery, and you have been tasked
with designing the data model You move your on-premises sales data warehouse with a star
data schema to BigQuery but notice performance issues when querying the data of the past 30
days.
Based on Google's recommended practices, what should you do to speed up the query without
increasing storage costs?
A. Denormalize the data
B. Shard the data by customer ID
C. Materialize the dimensional data in views
D. Partition the data by transaction date
Answer:
C
5. You are part of a healthcare organization where data is organized and managed by
respective data owners in various storage services. As a result of this decentralized ecosystem,
discovering and managing data has become difficult.
You need to quickly identify and implement a cost-optimized solution to assist your organization
with the following
• Data management and discovery
• Data lineage tracking
• Data quality validation
How should you build the solution?
A. Use BigLake to convert the current solution into a data lake architecture.
B. Build a new data discovery tool on Google Kubernetes Engine that helps with new source
onboarding and data lineage tracking.
C. Use BigOuery to track data lineage, and use Dataprep to manage data and perform data
quality validation.
D. Use Dataplex to manage data, track data lineage, and perform data quality validation.
Answer:
D
Explanation:
Dataplex is a Google Cloud service that provides a unified data fabric for data lakes and data
warehouses. It enables data governance, management, and discovery across multiple data
domains, zones, and assets. Dataplex also supports data lineage tracking, which shows the
origin and transformation of data over time. Dataplex also integrates with Dataprep, a data
preparation and quality tool that allows users to clean, enrich, and transform data using a visual
interface. Dataprep can also monitor data quality and detect anomalies using machine learning.
Therefore, Dataplex is the most suitable solution for the given scenario, as it meets all the
requirements of data management and discovery, data lineage tracking, and data quality
validation.
Reference: Dataplex overview
Automate data governance, extend your data fabric with Dataplex-BigLake integration
Dataprep documentation
6. You set up a streaming data insert into a Redis cluster via a Kafka cluster. Both clusters are
running on Compute Engine instances. You need to encrypt data at rest with encryption keys
that you can create, rotate, and destroy as needed.
What should you do?
A. Create a dedicated service account, and use encryption at rest to reference your data stored
in your
Compute Engine cluster instances as part of your API service calls.
B. Create encryption keys in Cloud Key Management Service. Use those keys to encrypt your
data in all of the Compute Engine cluster instances.
C. Create encryption keys locally. Upload your encryption keys to Cloud Key Management
Service.
Use those keys to encrypt your data in all of the Compute Engine cluster instances.
D. Create encryption keys in Cloud Key Management Service. Reference those keys in your
API service calls when accessing the data in your Compute Engine cluster instances.
Answer:
C
7. The Dataflow SDKs have been recently transitioned into which Apache service?
A. Apache Spark
B. Apache Hadoop
C. Apache Kafka
D. Apache Beam
Answer:
D
Explanation:
Dataflow SDKs are being transitioned to Apache Beam, as per the latest Google directive
Reference: https://cloud.google.com/dataflow/docs/
8. You need to migrate a Redis database from an on-premises data center to a Memorystore for
Redis instance. You want to follow Google-recommended practices and perform the migration
for minimal cost. time, and effort.
What should you do?
A. Make a secondary instance of the Redis database on a Compute Engine instance, and then
perform a live cutover.
B. Write a shell script to migrate the Redis data, and create a new Memorystore for Redis
instance.
C. Create a Dataflow job to road the Redis database from the on-premises data center. and
write the data to a Memorystore for Redis instance
D. Make an RDB backup of the Redis database, use the gsutil utility to copy the RDB file into a
Cloud Storage bucket, and then import the RDB tile into the Memorystore for Redis instance.
Answer:
D
Explanation:
The import and export feature uses the native RDB snapshot feature of Redis to import data
into or export data out of a Memorystore for Redis instance. The use of the native RDB format
prevents lock-in and makes it very easy to move data within Google Cloud or outside of Google
Cloud. Import and export uses Cloud Storage buckets to store RDB files.
Reference: https://cloud.google.com/memorystore/docs/redis/import-export-overview
9. Flowlogistic’s CEO wants to gain rapid insight into their customer base so his sales team can
be better informed in the field. This team is not very technical, so they’ve purchased a
visualization tool to simplify the creation of BigQuery reports. However, they’ve been
overwhelmed by all the data in the table, and are spending a lot of money on queries trying to
find the data they need. You want to solve their problem in the most cost-effective way.
What should you do?
A. Export the data into a Google Sheet for virtualization.
B. Create an additional table with only the necessary columns.
C. Create a view on the table to present to the virtualization tool.
D. Create identity and access management (IAM) roles on the appropriate columns, so only
they appear in a query.
Answer:
C
10. You have a data pipeline with a Dataflow job that aggregates and writes time series metrics
to Bigtable. You notice that data is slow to update in Bigtable. This data feeds a dashboard
used by thousands of users across the organization. You need to support additional concurrent
users and reduce the amount of time required to write the data.
What should you do? Choose 2 answers
A. Configure your Dataflow pipeline to use local execution.
B. Modify your Dataflow pipeline lo use the Flatten transform before writing to Bigtable.
C. Modify your Dataflow pipeline to use the CoGrcupByKey transform before writing to Bigtable.
D. Increase the maximum number of Dataflow workers by setting maxNumWorkers in
PipelineOptions.
E. Increase the number of nodes in the Bigtable cluster.
Answer:
D, E
Explanation:
https://cloud.google.com/bigtable/docs/performance#performance-write-throughput
https://cloud.google.com/dataflow/docs/reference/pipeline-options
11. Your startup has a web application that currently serves customers out of a single region in
Asi
a. You are targeting funding that will allow your startup lo serve customers globally. Your current
goal is to optimize for cost, and your post-funding goat is to optimize for global presence and
performance. You must use a native JDBC driver.
What should you do?
A. Use Cloud Spanner to configure a single region instance initially. and then configure multi-
region C oud Spanner instances after securing funding.
B. Use a Cloud SQL for PostgreSQL highly available instance first, and 8»gtable with US.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help