Instant Access Amazon.AWS-Certified-Machine-Learning-Specialty.v2026-04-30.q178 Actual Practice Test Engine for Free (Page 13)

Question 56

A machine learning specialist stores IoT soil sensor data in Amazon DynamoDB table and stores weather event data as JSON files in Amazon S3. The dataset in DynamoDB is 10 GB in size and the dataset in Amazon S3 is 5 GB in size. The specialist wants to train a model on this data to help predict soil moisture levels as a function of weather events using Amazon SageMaker.
Which solution will accomplish the necessary transformation to train the Amazon SageMaker model with the LEAST amount of administrative overhead?

A.Launch an Amazon EMR cluster. Create an Apache Hive external table for the DynamoDB table and S3 data. Join the Hive tables and write the results out to Amazon S3.
B.Crawl the data using AWS Glue crawlers. Write an AWS Glue ETL job that merges the two tables and writes the output to an Amazon Redshift cluster.
C.Enable Amazon DynamoDB Streams on the sensor table. Write an AWS Lambda function that consumes the stream and appends the results to the existing weather files in Amazon S3.
D.Crawl the data using AWS Glue crawlers. Write an AWS Glue ETL job that merges the two tables and writes the output in CSV format to Amazon S3.

Question 57

A company's Machine Learning Specialist needs to improve the training speed of a time-series forecasting model using TensorFlow. The training is currently implemented on a single-GPU machine and takes approximately 23 hours to complete. The training needs to be run daily.
The model accuracy js acceptable, but the company anticipates a continuous increase in the size of the training data and a need to update the model on an hourly, rather than a daily, basis. The company also wants to minimize coding effort and infrastructure changes What should the Machine Learning Specialist do to the training solution to allow it to scale for future demand?

A.Switch to using a built-in AWS SageMaker DeepAR model. Parallelize the training to as many machines as needed to achieve the business goals.
B.Move the training to Amazon EMR and distribute the workload to as many machines as needed to achieve the business goals.
C.Change the TensorFlow code to implement a Horovod distributed framework supported by Amazon SageMaker. Parallelize the training to as many machines as needed to achieve the business goals.
D.Do not change the TensorFlow code. Change the machine to one with a more powerful GPU to speed up the training.

Question 58

Which of the following metrics should a Machine Learning Specialist generally use to compare/evaluate machine learning classification models against each other?

A.Recall
B.Misclassification rate
C.Mean absolute percentage error (MAPE)
D.Area Under the ROC Curve (AUC)

Question 59

A Data Scientist is developing a machine learning model to classify whether a financial transaction is fraudulent. The labeled data available for training consists of 100,000 non-fraudulent observations and 1,000 fraudulent observations.
The Data Scientist applies the XGBoost algorithm to the data, resulting in the following confusion matrix when the trained model is applied to a previously unseen validation dataset. The accuracy of the model is 99.1%, but the Data Scientist needs to reduce the number of false negatives.

Which combination of steps should the Data Scientist take to reduce the number of false negative predictions by the model? (Choose two.)

A.Change the XGBoost eval_metric parameter to optimize based on Root Mean Square Error (RMSE).
B.Increase the XGBoost scale_pos_weight parameter to adjust the balance of positive and negative weights.
C.Increase the XGBoost max_depth parameter because the model is currently underfitting the data.
D.Change the XGBoost eval_metric parameter to optimize based on Area Under the ROC Curve (AUC).
E.Decrease the XGBoost max_depth parameter because the model is currently overfitting the data.

Correct Answer: B,D

The Data Scientist should increase the XGBoost scale_pos_weight parameter to adjust the balance of positive and negative weights and change the XGBoost eval_metric parameter to optimize based on Area Under the ROC Curve (AUC). This will help reduce the number of false negative predictions by the model.
The scale_pos_weight parameter controls the balance of positive and negative weights in the XGBoost algorithm. It is useful for imbalanced classification problems, such as fraud detection, where the number of positive examples (fraudulent transactions) is much smaller than the number of negative examples (non-fraudulent transactions). By increasing the scale_pos_weight parameter, the Data Scientist can assign more weight to the positive class and make the model more sensitive to detecting fraudulent transactions.
The eval_metric parameter specifies the metric that is used to measure the performance of the model during training and validation. The default metric for binary classification problems is the error rate, which is the fraction of incorrect predictions. However, the error rate is not a good metric for imbalanced classification problems, because it does not take into account the cost of different types of errors. For example, in fraud detection, a false negative (failing to detect a fraudulent transaction) is more costly than a false positive (flagging a non-fraudulent transaction as fraudulent). Therefore, the Data Scientist should use a metric that reflects the trade-off between the true positive rate (TPR) and the false positive rate (FPR), such as the Area Under the ROC Curve (AUC). The AUC is a measure of how well the model can distinguish between the positive and negative classes, regardless of the classification threshold. A higher AUC means that the model can achieve a higher TPR with a lower FPR, which is desirable for fraud detection.
References:
XGBoost Parameters - Amazon Machine Learning
Using XGBoost with Amazon SageMaker - AWS Machine Learning Blog

Question 60

A Machine Learning Specialist prepared the following graph displaying the results of k-means for k = [1:10]

Considering the graph, what is a reasonable selection for the optimal choice of k?

A.7
B.1
C.10
D.4

Question 56

Question 57

Question 58

Question 59

Question 60

Download PDF File