Instant Access Google.Professional-Machine-Learning-Engineer.v2023-02-17.q116 Actual Practice Test Engine for Free (Page 17)

Question 76

A company wants to predict the sale prices of houses based on available historical sales data. The target variable in the company's dataset is the sale price. The features include parameters such as the lot size, living area measurements, non-living area measurements, number of bedrooms, number of bathrooms, year built, and postal code. The company wants to use multi-variable linear regression to predict house sale prices.
Which step should a machine learning specialist take to remove features that are irrelevant for the analysis and reduce the model's complexity?

A.Plot a histogram of the features and compute their standard deviation. Remove features with high variance.
B.Plot a histogram of the features and compute their standard deviation. Remove features with low variance.
C.Build a heatmap showing the correlation of the dataset against itself. Remove features with low mutual correlation scores.
D.Run a correlation check of all features against the target variable. Remove features with low target variable correlation scores.

Question 77

A Machine Learning Specialist must build out a process to query a dataset on Amazon S3 using Amazon Athena. The dataset contains more than 800,000 records stored as plaintext CSV files. Each record contains
200 columns and is approximately 1.5 MB in size. Most queries will span 5 to 10 columns only.
How should the Machine Learning Specialist transform the dataset to minimize query runtime?

A.Convert the records to Apache Parquet format.
B.Convert the records to JSON format.
C.Convert the records to GZIP CSV format.
D.Convert the records to XML format.

Question 78

A Machine Learning Specialist kicks off a hyperparameter tuning job for a tree-based ensemble model using Amazon SageMaker with Area Under the ROC Curve (AUC) as the objective metric. This workflow will eventually be deployed in a pipeline that retrains and tunes hyperparameters each night to model click-through on data that goes stale every 24 hours.
With the goal of decreasing the amount of time it takes to train these models, and ultimately to decrease costs, the Specialist wants to reconfigure the input hyperparameter range(s).
Which visualization will accomplish this?

A.A scatter plot showing the performance of the objective metric over each training iteration.
B.A histogram showing whether the most important input feature is Gaussian.
C.A scatter plot with points colored by target variable that uses t-Distributed Stochastic Neighbor Embedding (t-SNE) to visualize the large number of input variables in an easier-to-read dimension.
D.A scatter plot showing the correlation between maximum tree depth and the objective metric.

Question 79

A company uses a long short-term memory (LSTM) model to evaluate the risk factors of a particular energy sector. The model reviews multi-page text documents to analyze each sentence of the text and categorize it as either a potential risk or no risk. The model is not performing well, even though the Data Scientist has experimented with many different network structures and tuned the corresponding hyperparameters.
Which approach will provide the MAXIMUM performance boost?

A.Reduce the learning rate and run the training process until the training loss stops decreasing.
B.Use gated recurrent units (GRUs) instead of LSTM and run the training process until the validation loss stops decreasing.
C.Initialize the words by term frequency-inverse document frequency (TF-IDF) vectors pretrained on a large collection of news articles related to the energy sector.
D.Initialize the words by word2vec embeddings pretrained on a large collection of news articles related to the energy sector.

Question 80

One of your models is trained using data provided by a third-party data broker. The data broker does not reliably notify you of formatting changes in the dat a. You want to make your model training pipeline more robust to issues like this. What should you do?

A.Use TensorFlow Data Validation to detect and flag schema anomalies.
B.Use custom TensorFlow functions at the start of your model training to detect and flag known formatting errors.
C.Use TensorFlow Transform to create a preprocessing component that will normalize data to the expected distribution, and replace values that don't match the schema with 0.
D.Use tf.math to analyze the data, compute summary statistics, and flag statistical anomalies.

Question 76

Question 77

Question 78

Question 79

Question 80

Download PDF File