Instant Access Amazon.DAS-C01.v2024-02-15.q109 Actual Practice Test Engine for Free (Page 6)

Question 21

A company analyzes historical data and needs to query data that is stored in Amazon S3. New data is generated daily as .csv files that are stored in Amazon S3. The company's analysts are using Amazon Athena to perform SQL queries against a recent subset of the overall data. The amount of data that is ingested into Amazon S3 has increased substantially over time, and the query latency also has increased.
Which solutions could the company implement to improve query performance? (Choose two.)

A.Use Athena to extract the data and store it in Apache Parquet format on a daily basis. Query the extracted data.
B.Run a daily AWS Glue ETL job to compress the data files by using the .lzo format. Query the compressed data.
C.Use MySQL Workbench on an Amazon EC2 instance, and connect to Athena by using a JDBC or ODBC connector. Run the query from MySQL Workbench instead of Athena directly.
D.Run a daily AWS Glue ETL job to convert the data files to Apache Parquet and to partition the converted files. Create a periodic AWS Glue crawler to automatically crawl the partitioned data on a daily basis.
E.Run a daily AWS Glue ETL job to compress the data files by using the .gzip format. Query the compressed data.

Question 22

A company is migrating from an on-premises Apache Hadoop cluster to an Amazon EMR cluster. The cluster runs only during business hours. Due to a company requirement to avoid intraday cluster failures, the EMR cluster must be highly available. When the cluster is terminated at the end of each business day, the data must persist.
Which configurations would enable the EMR cluster to meet these requirements? (Choose three.)

A.EMR File System (EMRFS) for storage
B.Hadoop Distributed File System (HDFS) for storage
C.AWS Glue Data Catalog as the metastore for Apache Hive
D.MySQL database on the master node as the metastore for Apache Hive
E.Multiple master nodes in a single Availability Zone
F.Multiple master nodes in multiple Availability Zones

Question 23

A company launched a service that produces millions of messages every day and uses Amazon Kinesis Data Streams as the streaming service.
The company uses the Kinesis SDK to write data to Kinesis Data Streams. A few months after launch, a data analyst found that write performance is significantly reduced. The data analyst investigated the metrics and determined that Kinesis is throttling the write requests. The data analyst wants to address this issue without significant changes to the architecture.
Which actions should the data analyst take to resolve this issue? (Choose two.)

A.Increase the Kinesis Data Streams retention period to reduce throttling.
B.Replace the Kinesis API-based data ingestion mechanism with Kinesis Agent.
C.Increase the number of shards in the stream using the UpdateShardCount API.
D.Choose partition keys in a way that results in a uniform record distribution across shards.
E.Customize the application code to include retry logic to improve performance.

Question 24

A company has a business unit uploading .csv files to an Amazon S3 bucket. The company's data platform team has set up an AWS Glue crawler to do discovery, and create tables and schemas. An AWS Glue job writes processed data from the created tables to an Amazon Redshift database. The AWS Glue job handles column mapping and creating the Amazon Redshift table appropriately. When the AWS Glue job is rerun for any reason in a day, duplicate records are introduced into the Amazon Redshift table.
Which solution will update the Redshift table without duplicates when jobs are rerun?

A.Modify the AWS Glue job to copy the rows into a staging table. Add SQL commands to replace the existing rows in the main table as postactions in the DynamicFrameWriter class.
B.Load the previously inserted data into a MySQL database in the AWS Glue job. Perform an upsert operation in MySQL, and copy the results to the Amazon Redshift table.
C.Use Apache Spark's DataFrame dropDuplicates() API to eliminate duplicates and then write the data to Amazon Redshift.
D.Use the AWS Glue ResolveChoice built-in transform to select the most recent value of the column.

Question 25

A real estate company has a mission-critical application using Apache HBase in Amazon EMR. Amazon EMR is configured with a single master node. The company has over 5 TB of data stored on an Hadoop Distributed File System (HDFS). The company wants a cost-effective solution to make its HBase data highly available.
Which architectural pattern meets company's requirements?

A.Use Spot Instances for core and task nodes and a Reserved Instance for the EMR master node.
Configure
the EMR cluster with multiple master nodes. Schedule automated snapshots using Amazon EventBridge.
B.Store the data on an EMR File System (EMRFS) instead of HDFS. Enable EMRFS consistent view.
Create an EMR HBase cluster with multiple master nodes. Point the HBase root directory to an Amazon S3 bucket.
C.Store the data on an EMR File System (EMRFS) instead of HDFS and enable EMRFS consistent view.
Run two separate EMR clusters in two different Availability Zones. Point both clusters to the same HBase root directory in the same Amazon S3 bucket.
D.Store the data on an EMR File System (EMRFS) instead of HDFS and enable EMRFS consistent view.
Create a primary EMR HBase cluster with multiple master nodes. Create a secondary EMR HBase read- replica cluster in a separate Availability Zone. Point both clusters to the same HBase root directory in the same Amazon S3 bucket.

Question 21

Question 22

Question 23

Question 24

Question 25

Download PDF File