Instant Access Google.Professional-Data-Engineer.v2023-03-14.q177 Actual Practice Test Engine for Free (Page 19)

Question 86

You want to automate execution of a multi-step data pipeline running on Google Cloud. The pipeline includes Cloud Dataproc and Cloud Dataflow jobs that have multiple dependencies on each other. You want to use managed services where possible, and the pipeline will run every day. Which tool should you use?

A.Cloud Composer
B.Workflow Templates on Cloud Dataproc
C.Cloud Scheduler
D.cron

Question 87

Your company produces 20,000 files every hour. Each data file is formatted as a comma separated values
(CSV) file that is less than 4 KB. All files must be ingested on Google Cloud Platform before they can be
processed. Your company site has a 200 ms latency to Google Cloud, and your Internet connection
bandwidth is limited as 50 Mbps. You currently deploy a secure FTP (SFTP) server on a virtual machine in
Google Compute Engine as the data ingestion point. A local SFTP client runs on a dedicated machine to
transmit the CSV files as is. The goal is to make reports with data from the previous day available to the
executives by 10:00 a.m. each day. This design is barely able to keep up with the current volume, even
though the bandwidth utilization is rather low.
You are told that due to seasonality, your company expects the number of files to double for the next three
months. Which two actions should you take? (Choose two.)

A.Introduce data compression for each file to increase the rate file of file transfer.
B.Create an S3-compatible storage endpoint in your network, and use Google Cloud Storage Transfer
Service to transfer on-premices data to the designated storage bucket.
C.Redesign the data ingestion process to use gsutil tool to send the CSV files to a storage bucket in
parallel.
D.Contact your internet service provider (ISP) to increase your maximum bandwidth to at least 100 Mbps.
E.Assemble 1,000 files into a tape archive (TAR) file. Transmit the TAR files instead, and disassemble
the CSV files in the cloud upon receiving them.

Question 88

You are developing a software application using Google's Dataflow SDK, and want to use conditional, for loops and other complex programming structures to create a branching pipeline. Which component will be used for the data processing operation?

A.PCollection
B.Transform
C.Pipeline
D.Sink API

Question 89

You work for an economic consulting firm that helps companies identify economic trends as they happen. As part of your analysis, you use Google BigQuery to correlate customer data with the average prices of the 100 most common goods sold, including bread, gasoline, milk, and others. The average prices of these goods are updated every 30 minutes. You want to make sure this data stays up to date so you can combine it with other data in BigQuery as cheaply as possible. What should you do?

A.Store and update the data in a regional Google Cloud Storage bucket and create a federated data source in BigQuery
B.Store the data in Google Cloud Datastore. Use Google Cloud Dataflow to query BigQuery and combine the data programmatically with the data stored in Cloud Datastore
C.Load the data every 30 minutes into a new partitioned table in BigQuery.
D.Store the data in a file in a regional Google Cloud Storage bucket. Use Cloud Dataflow to query BigQuery and combine the data programmatically with the data stored in Google Cloud Storage.

Question 90

You are implementing security best practices on your data pipeline. Currently, you are manually executing
jobs as the Project Owner. You want to automate these jobs by taking nightly batch files containing non-
public information from Google Cloud Storage, processing them with a Spark Scala job on a Google Cloud
Dataproc cluster, and depositing the results into Google BigQuery.
How should you securely run this workload?

A.Grant the Project Owner role to a service account, and run the job with it
B.Restrict the Google Cloud Storage bucket so only you can see the files
C.Use a user account with the Project Viewer role on the Cloud Dataproc cluster to read the batch files
and write to BigQuery
D.Use a service account with the ability to read the batch files and to write to BigQuery

Question 86

Question 87

Question 88

Question 89

Question 90

Download PDF File