Instant Access Google.Professional-Data-Engineer.v2023-03-14.q177 Actual Practice Test Engine for Free (Page 16)

Question 71

You decided to use Cloud Datastore to ingest vehicle telemetry data in real time. You want to build a storage system that will account for the long-term data growth, while keeping the costs low. You also want to create snapshots of the data periodically, so that you can make a point-in-time (PIT) recovery, or clone a copy of the data for Cloud Datastore in a different environment. You want to archive these snapshots for a long time.
Which two methods can accomplish this? (Choose two.)

A.Write an application that uses Cloud Datastore client libraries to read all the entities. Format the exported data into a JSON file. Apply compression before storing the data in Cloud Source Repositories.
B.Use managed export, and store the data in a Cloud Storage bucket using Nearline or Coldline class.
C.Use managed export, and then import to Cloud Datastore in a separate project under a unique namespace reserved for that export.
D.Write an application that uses Cloud Datastore client libraries to read all the entities. Treat each entity as a BigQuery table row via BigQuery streaming insert. Assign an export timestamp for each export, and attach it as an extra column for each row. Make sure that the BigQuery table is partitioned using the export timestamp column.
E.Use managed export, and then import the data into a BigQuery table created just for that export, and delete temporary export files.

Question 72

You are implementing several batch jobs that must be executed on a schedule. These jobs have many interdependent steps that must be executed in a specific order. Portions of the jobs involve executing shell scripts, running Hadoop jobs, and running queries in BigQuery. The jobs are expected to run for many minutes up to several hours. If the steps fail, they must be retried a fixed number of times. Which service should you use to manage the execution of these jobs?

A.Cloud Dataflow
B.Cloud Functions
C.Cloud Scheduler
D.Cloud Composer

Question 73

You are building a model to make clothing recommendations. You know a user's fashion preference is likely to change over time, so you build a data pipeline to stream new data back to the model as it becomes available.
How should you use this data to train the model?

A.Continuously retrain the model on a combination of existing data and the new data.
B.Train on the existing data while using the new data as your test set.
C.Train on the new data while using the existing data as your test set.
D.Continuously retrain the model on just the new data.

Question 74

You have a query that filters a BigQuery table using a WHERE clause on timestamp and ID columns. By using bq query - -dry_run you learn that the query triggers a full scan of the table, even though the filter on timestamp and ID select a tiny fraction of the overall data. You want to reduce the amount of data scanned by BigQuery with minimal changes to existing SQL queries. What should you do?

A.Create a separate table for each ID.
B.Use the LIMIT keyword to reduce the number of rows returned.
C.Recreate the table with a partitioning column and clustering column.
D.Use the bq query - -maximum_bytes_billedflag to restrict the number of bytes billed.

Question 75

You need to create a new transaction table in Cloud Spanner that stores product sales data. You are deciding what to use as a primary key. From a performance perspective, which strategy should you choose?

A.The current epoch time
B.A concatenation of the product name and the current epoch time
C.A random universally unique identifier number (version 4 UUID)
D.The original order identification number from the sales system, which is a monotonically increasing integer

Question 71

Question 72

Question 73

Question 74

Question 75

Download PDF File