Instant Access Amazon.AWS-Certified-Machine-Learning-Specialty.v2022-02-07.q122 Actual Practice Test Engine for Free (Page 23)

Question 106

A Data Scientist is developing a machine learning model to predict future patient outcomes based on information collected about each patient and their treatment plans. The model should output a continuous value as its prediction. The data available includes labeled outcomes for a set of 4,000 patients. The study was conducted on a group of individuals over the age of 65 who have a particular disease that is known to worsen with age.
Initial models have performed poorly. While reviewing the underlying data, the Data Scientist notices that, out of 4,000 patient observations, there are 450 where the patient age has been input as 0. The other features for these observations appear normal compared to the rest of the sample population.
How should the Data Scientist correct this issue?

A.Use k-means clustering to handle missing features.
B.Drop all records from the dataset where age has been set to 0.
C.Replace the age field value for records with a value of 0 with the mean or median value from the dataset.
D.Drop the age feature from the dataset and train the model using the rest of the features.

Question 107

A data scientist needs to identify fraudulent user accounts for a company's ecommerce platform. The company wants the ability to determine if a newly created account is associated with a previously known fraudulent user. The data scientist is using AWS Glue to cleanse the company's application logs during ingestion.
Which strategy will allow the data scientist to identify fraudulent accounts?

A.Search for duplicate accounts in the AWS Glue Data Catalog.
B.Create a FindMatches machine learning transform in AWS Glue.
C.Execute the built-in FindDuplicates Amazon Athena query.
D.Create an AWS Glue crawler to infer duplicate accounts in the source data.

Question 108

A company is using Amazon Textract to extract textual data from thousands of scanned text-heavy legal documents daily. The company uses this information to process loan applications automatically. Some of the documents fail business validation and are returned to human reviewers, who investigate the errors. This activity increases the time to process the loan applications.
What should the company do to reduce the processing time of loan applications?

A.Configure Amazon Textract to route low-confidence predictions to Amazon SageMaker Ground Truth. Perform a manual review on those words before performing a business validation.
B.Use an Amazon Textract synchronous operation instead of an asynchronous operation.
C.Use Amazon Rekognition's feature to detect text in an image to extract the data from scanned images. Use this information to process the loan applications.
D.Configure Amazon Textract to route low-confidence predictions to Amazon Augmented AI (Amazon A2I). Perform a manual review on those words before performing a business validation.

Question 109

A Machine Learning Specialist is assigned to a Fraud Detection team and must tune an XGBoost model, which is working appropriately for test dat a. However, with unknown data, it is not working as expected. The existing parameters are provided as follows.

Which parameter tuning guidelines should the Specialist follow to avoid overfitting?

A.Increase the max_depth parameter value.
B.Lower the max_depth parameter value.
C.Lower the min_child_weight parameter value.
D.Update the objective to binary:logistic.

Question 110

A company ingests machine learning (ML) data from web advertising clicks into an Amazon S3 data lake. Click data is added to an Amazon Kinesis data stream by using the Kinesis Producer Library (KPL). The data is loaded into the S3 data lake from the data stream by using an Amazon Kinesis Data Firehose delivery stream.
As the data volume increases, an ML specialist notices that the rate of data ingested into Amazon S3 is relatively constant. There also is an increasing backlog of data for Kinesis Data Streams and Kinesis Data Firehose to ingest.
Which next step is MOST likely to improve the data ingestion rate into Amazon S3?

A.Increase the number of S3 prefixes for the delivery stream to write to.
B.Decrease the retention period for the data stream.
C.Increase the number of shards for the data stream.
D.Add more consumers using the Kinesis Client Library (KCL).

Question 106

Question 107

Question 108

Question 109

Question 110

Download PDF File