Microsoft Azure Test 4

Welcome to your Microsoft Azure Test-4

Name

Phone

Q1. Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.

After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You are analyzing a numerical dataset which contains missing values in several columns.
You must clean the missing values using an appropriate operation without affecting the dimensionality of the feature set.
You need to analyze a full dataset to include all values.
Solution: Calculate the column median value and use the median value as the replacement for any missing value in the column.
Does the solution meet the goal?

Select 1 option(s):

Yes

Q2. You are developing a hands-on workshop to introduce Docker for Windows to attendees.

You need to ensure that workshop attendees can install Docker on their devices.
Which two prerequisite components should attendees install on the devices? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.

Select 2 option(s):

Microsoft Hardware-Assisted Virtualization Detection Tool

BIOS-enabled virtualization

VirtualBox

Kitematic

Windows 10 64-bit Professional

Q3. You plan to build a team data science environment. Data for training models in machine learning pipelines will be over 20 GB in size.

You have the following requirements:
? Models must be built using Caffe2 or Chainer frameworks.
? Data scientists must be able to use a data science environment to build the machine learning pipelines and train models on their personal devices in both connected and disconnected network environments.
Personal devices must support updating machine learning pipelines when connected to a network.
You need to select a data science environment.
Which environment should you use?

Select 1 option(s):

Azure Machine Learning Studio

Azure Databricks

Azure Kubernetes Service (AKS)

Azure Machine Learning Service

Q4. HOTSPOT –

You are performing sentiment analysis using a CSV file that includes 12,000 customer reviews written in a short sentence format. You add the CSV file to Azure

Machine Learning Studio and configure it as the starting point dataset of an experiment. You add the Extract N-Gram Features from Text module to the experiment to extract key phrases from the customer review column in the dataset.

You must create a new n-gram dictionary from the customer review text and set the maximum n-gram size to trigrams.

What should you select? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Hot Area:

Select 1 option(s):

Vocabulary mode: Merge N-Grams size: 4,000

Vocabulary mode: Create N-Grams size: 3

Vocabulary mode: Update N-Grams size: 12,000

Q5. You are creating a machine learning model. You have a dataset that contains null rows.

You need to use the Clean Missing Data module in Azure Machine Learning Studio to identify and resolve the null and missing data in the dataset.
Which parameter should you use?

Select 1 option(s):

Hot Deck

Remove entire row

Replace with mode

Remove entire column

Custom substitution value

Replace with mean

Q6. You plan to create a speech recognition deep learning model.

The model must support the latest version of Python.
You need to recommend a deep learning framework for speech recognition to include in the Data Science Virtual Machine (DSVM).
What should you recommend?

Select 1 option(s):

Scikit-learn

Weka

TensorFlow

Rattle

Q7. DRAG DROP –

You configure a Deep Learning Virtual Machine for Windows.
You need to recommend tools and frameworks to perform the following:

? Build deep neural network (DNN) models
? Perform interactive data exploration and visualization

Which tools and frameworks should you recommend? To answer, drag the appropriate tools to the correct tasks. Each tool may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content.

NOTE: Each correct selection is worth one point.

Select and Place:

Select 1 option(s):

Microsoft Cognitive Toolkit Vowpal Wabbit

Vowpal Wabbit PowerBI Desktop

Azure Data Factory PowerBI Desktop

Q8. HOTSPOT –

You are retrieving data from a large datastore by using Azure Machine Learning Studio.
You must create a subset of the data for testing purposes using a random sampling seed based on the system clock.
You add the Partition and Sample module to your experiment.
You need to select the properties for the module.

Which values should you select? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Hot Area:

Select 1 option(s):

Assign to Folds time.cock()

Pick Fold utcNow()

Sampling 0

Head 1

Q9. You use Azure Machine Learning Studio to build a machine learning experiment.

You need to divide data into two distinct datasets.
Which module should you use?

Select 1 option(s):

Partition and Sample

Tune Model-Hyperparameters

Load Trained Model

Assign Data to Clusters

Q10. You are solving a classification task.

The dataset is imbalanced.
You need to select an Azure Machine Learning Studio module to improve the classification accuracy.
Which module should you use?

Select 1 option(s):

Synthetic Minority Oversampling Technique (SMOTE)

Filter Based Feature Selection

Permutation Feature Importance

Fisher Linear Discriminant Analysis

Q11. Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.

After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You are a data scientist using Azure Machine Learning Studio.
You need to normalize values to produce an output column into bins to predict a target column.
Solution: Apply a Quantiles binning mode with a PQuantile normalization.
Does the solution meet the goal?

Select 1 option(s):

Yes

Q12. You train a model and register it in your Azure Machine Learning workspace. You are ready to deploy the model as a real-time web service.

You deploy the model to an Azure Kubernetes Service (AKS) inference cluster, but the deployment fails because an error occurs when the service runs the entry script that is associated with the model deployment.
You need to debug the error by iteratively modifying the code and reloading the service, without requiring a re-deployment of the service for each code update.
What should you do?

Select 1 option(s):

Modify the AKS service deployment configuration to enable application insights and re-deploy to AKS.

Create an Azure Container Instances (ACI) web service deployment configuration and deploy the model on ACI.

Add a breakpoint to the first line of the entry script and redeploy the service to AKS.

Register a new version of the model and update the entry script to load the new version of the model from its registered path.

Create a local web service deployment configuration and deploy the model to a local Docker container.

Q13. HOTSPOT –

You have a Python data frame named salesData in the following format:

The data frame must be unpivoted to a long data format as follows:

You need to use the pandas.melt() function in Python to perform the transformation.

How should you complete the code segment? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Hot Area:

Select 1 option(s):

dataFrame Shop ['2017','2018']

pandas shop 'year'

year value ['year']

Q14. DRAG DROP –

You are analyzing a raw dataset that requires cleaning.

You must perform transformations and manipulations by using Azure Machine Learning Studio.

You need to identify the correct modules to perform the transformations.

Which modules should you choose? To answer, drag the appropriate modules to the correct scenarios. Each module may be used once, more than once, or not at all.

You may need to drag the split bar between panes or scroll to view content.

NOTE: Each correct selection is worth one point.

Select and Place:

Select 1 option(s):

Threshold Filter SMOTE Convert to Indicator Values Remove Duplicate Rows

Clean Missing Data SMOTE Convert to Indicator Values Remove Duplicate Rows

Convert to Indicator Values Remove Duplicate Rows Clean Missing Data SMOTE

Q15. You are developing deep learning models to analyze semi-structured, unstructured, and structured data types.

You have the following data available for model building:
? Video recordings of sporting events
? Transcripts of radio commentary about events
? Logs from related social media feeds captured during sporting events
You need to select an environment for creating the model.
Which environment should you use?

Select 1 option(s):

Azure Machine Learning Studio

Azure Data Lake Analytics

Azure HDInsight with Spark MLib

Azure Cognitive Services

Q16. You are analyzing a dataset containing historical data from a local taxi company. You are developing a regression model.

You must predict the fare of a taxi trip.
You need to select performance metrics to correctly evaluate the regression model.
Which two metrics can you use? Each correct answer presents a complete solution?
NOTE: Each correct selection is worth one point.

Select 2 option(s):

an R-Squared value close to 1

an F1 score that is high

a Root Mean Square Error value that is high

an R-Squared value close to 0

a Root Mean Square Error value that is low

an F1 score that is low

Q17. DRAG DROP –

You are creating an experiment by using Azure Machine Learning Studio.

You must divide the data into four subsets for evaluation. There is a high degree of missing values in the data. You must prepare the data for analysis.

You need to select appropriate methods for producing the experiment.

Which three modules should you run in sequence? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order.

NOTE: More than one order of answer choices is correct. You will receive credit for any of the correct orders you select.

Select and Place:

Select 1 option(s):

Feature Hashing Clean Missing Data Partition and Sample

Missing Values Scrubber Clean Missing Data Partition and Sample

Import Data Clean Missing Data Partition and Sample

Q18. HOTSPOT –

You create an experiment in Azure Machine Learning Studio. You add a training dataset that contains 10,000 rows. The first 9,000 rows represent class 0 (90 percent).

The remaining 1,000 rows represent class 1 (10 percent).

The training set is imbalances between two classes. You must increase the number of training examples for class 1 to 4,000 by using 5 data rows. You add the

Synthetic Minority Oversampling Technique (SMOTE) module to the experiment.

You need to configure the module.

Which values should you use? To answer, select the appropriate options in the dialog box in the answer area.

NOTE: Each correct selection is worth one point.

Hot Area:

Select 1 option(s):

3000 5

300 5

1 4000

4000 5

Q19. Your team is building a data engineering and data science development environment.

The environment must support the following requirements:
? support Python and Scala
? compose data storage, movement, and processing services into automated data pipelines
? the same tool should be used for the orchestration of both data engineering and data science
? support workload isolation and interactive workloads
? enable scaling across a cluster of machines
You need to create the environment.
What should you do?

Select 1 option(s):

Build the environment in Azure Databricks and use Azure Data Factory for orchestration.

Build the environment in Apache Hive for HDInsight and use Azure Data Factory for orchestration.

Build the environment in Azure Databricks and use Azure Container Instances for orchestration.

Build the environment in Apache Spark for HDInsight and use Azure Container Instances for orchestration.

Q20. DRAG DROP –

You are building an intelligent solution using machine learning models.

The environment must support the following requirements:

? Data scientists must build notebooks in a cloud environment

? Data scientists must use automatic feature engineering and model building in machine learning pipelines.

? Notebooks must be deployed to retrain using Spark instances with dynamic worker allocation.

? Notebooks must be exportable to be version controlled locally.

You need to create the environment.

Which four actions should you perform in sequence? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order.

Select and Place:

Select 1 option(s):

Create an Azure Databricks cluster Install Microsoft Machine Learning for Apache Spark Create and execute the Zeppelin notebooks on the cluster When the cluster is ready, export Zeppelin notebooks to a local environment

Create an Azure HDInsight cluster t include the Apache Spark MLib library Install Microsoft Machine Learning for Apache Spark Create and execute the Zeppelin notebooks on the cluster When the cluster is ready, export Zeppelin notebooks to a local environment

Q21. You are moving a large dataset from Azure Machine Learning Studio to a Weka environment.

You need to format the data for the Weka environment.
Which module should you use?

Select 1 option(s):

Convert to CSV

Convert to SVMLight

Convert to Dataset

Convert to ARFF

Q22. You are solving a classification task.

You must evaluate your model on a limited data sample by using k-fold cross-validation. You start by configuring a k parameter as the number of splits.
You need to configure the k parameter for the cross-validation.
Which value should you use?

Select 1 option(s):

k=0.5

k=1

k=5

k=0.01

Q23. Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.

After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You are a data scientist using Azure Machine Learning Studio.
You need to normalize values to produce an output column into bins to predict a target column.
Solution: Apply an Equal Width with Custom Start and Stop binning mode.
Does the solution meet the goal?

Select 1 option(s):

Yes

Q24. You must store data in Azure Blob Storage to support Azure Machine Learning.

You need to transfer the data into Azure Blob Storage.
What are three possible ways to achieve the goal? Each correct answer presents a complete solution.
NOTE: Each correct selection is worth one point.

Select 3 option(s):

Python script

AzCopy

Azure Storage Explorer

Bulk Copy Program (BCP)

Bulk Insert SQL Query

Q25. HOTSPOT –

You are working on a classification task. You have a dataset indicating whether a student would like to play soccer and associated attributes. The dataset includes the following columns:

You need to classify variables by type.

Which variable should you add to each category? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Hot Area:

Select 1 option(s):

PrevExamMarks, Height, Weight IsPlaySoccer

Gender, IsPlaySoccer PrevExamMarks, Height, Weight

IsPlaySoccer Gender, IsPlaySoccer

Q26. You are implementing a machine learning model to predict stock prices.

The model uses a PostgreSQL database and requires GPU processing.
You need to create a virtual machine that is pre-configured with the required tools.
What should you do?

Select 1 option(s):

Create a Data Science Virtual Machine (DSVM) Windows edition.

Create a Deep Learning Virtual Machine (DLVM) Windows edition.

Create a Geo Al Data Science Virtual Machine (Geo-DSVM) Windows edition.

Create a Deep Learning Virtual Machine (DLVM) Linux edition.

Q27. You are a data scientist working for a bank and have used Azure ML to train and register a machine learning model that predicts whether a customer is likely to repay a loan.

You want to understand how your model is making selections and must be sure that the model does not violate government regulations such as denying loans based on where an applicant lives.
You need to determine the extent to which each feature in the customer data is influencing predictions.

What should you do?

Select 1 option(s):

Score the model against some test data with known label values and use the results to calculate a confusion matrix.

Enable data drift monitoring for the model and its training dataset.

Use the Hyperdrive library to test the model with multiple hyperparameter values.

Add tags to the model registration indicating the names of the features in the training dataset.

Use the interpretability package to generate an explainer for the model.

Q28. You plan to use a Deep Learning Virtual Machine (DLVM) to train deep learning models using Compute Unified Device Architecture (CUDA) computations.

You need to configure the DLVM to support CUDA.

What should you implement?

Select 1 option(s):

Computer Processing Unit (CPU) speed increase by using overclocking

High Random Access Memory (RAM) configuration

Graphic Processing Unit (GPU)

Solid State Drives (SSD)

Intel Software Guard Extensions (Intel SGX) technology

Q29. You plan to use a Data Science Virtual Machine (DSVM) with the open source deep learning frameworks Caffe2 and PyTorch.

You need to select a pre-configured DSVM to support the frameworks.
What should you create?

Select 1 option(s):

Geo AI Data Science Virtual Machine with ArcGIS

Data Science Virtual Machine for Windows 2012

Data Science Virtual Machine for Windows 2016

Data Science Virtual Machine for Linux (Ubuntu)

Data Science Virtual Machine for Linux (CentOS)

Q30. You are creating a classification model for a banking company to identify possible instances of credit card fraud. You plan to create the model in Azure Machine

Learning by using automated machine learning.
The training dataset that you are using is highly unbalanced.
You need to evaluate the classification model.
Which primary metric should you use?

Select 1 option(s):

accuracy

normalized_mean_absolute_error

AUC_weighted

spearman_correlation

normalized_root_mean_squared_error

Q31. You are developing a data science workspace that uses an Azure Machine Learning service.

You need to select a compute target to deploy the workspace.
What should you use?

Select 1 option(s):

Azure Databricks

Azure Container Service

Azure Data Lake Analytics

Apache Spark for HDInsight

Q32. You are performing feature engineering on a dataset.

You must add a feature named CityName and populate the column value with the text London.
You need to add the new feature to the dataset.
Which Azure Machine Learning Studio module should you use?

Select 1 option(s):

Edit Metadata

Filter Based Feature Selection

Latent Dirichlet Allocation

Execute Python Script

Q33. Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.

After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You are creating a new experiment in Azure Machine Learning Studio.
One class has a much smaller number of observations than the other classes in the training set.
You need to select an appropriate data sampling strategy to compensate for the class imbalance.
Solution: You use the Stratified split for the sampling mode.
Does the solution meet the goal?

Select 1 option(s):

Yes

Q34. You are building a regression model for estimating the number of calls during an event.

You need to determine whether the feature values achieve the conditions to build a Poisson regression model.
Which two conditions must the feature set contain? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.

Select 2 option(s):

The label data must be whole numbers.

The label data can be positive or negative.

The label data must be a positive value.

The label data must be non-discrete.

The label data must be a negative value.

Q35. HOTSPOT –

You are creating a machine learning model in Python. The provided dataset contains several numerical columns and one text column. The text column represents a product’s category. The product category will always be one of the following:

? Bikes
? Cars
? Vans
? Boats

You are building a regression model using the scikit-learn Python package.

You need to transform the text data to be compatible with the scikit-learn Python package.

How should you complete the code segment? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Hot Area:

Select 1 option(s):

numpy as df transpose[ProductCategoryMapping]

pandas as df transpose[ProductCategoryMapping]

Q36. Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.

After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You are analyzing a numerical dataset which contains missing values in several columns.
You must clean the missing values using an appropriate operation without affecting the dimensionality of the feature set.
You need to analyze a full dataset to include all values.
Solution: Remove the entire column that contains the missing data point.
Does the solution meet the goal?

Select 1 option(s):

Yes

Q37. Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.

After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You are creating a new experiment in Azure Machine Learning Studio.
One class has a much smaller number of observations than the other classes in the training set.
You need to select an appropriate data sampling strategy to compensate for the class imbalance.
Solution: You use the Principal Components Analysis (PCA) sampling mode.
Does the solution meet the goal?

Select 1 option(s):

Yes

Q38. You have a comma-separated values (CSV) file containing data from which you want to train a classification model.

You are using the Automated Machine Learning interface in Azure Machine Learning studio to train the classification model. You set the task type to Classification.
You need to ensure that the Automated Machine Learning process evaluates only linear models.
What should you do?

Select 1 option(s):

Clear the option to enable deep learning.

Set the task type to Regression

Set the Exit criterion option to a metric score threshold.

Add all algorithms other than linear ones to the blocked algorithms list.

Clear the option to perform automatic featurization.

Q39. You are creating a machine learning model.

You need to identify outliers in the data.
Which two visualizations can you use? Each correct answer presents a complete solution.
NOTE: Each correct selection is worth one point.

Select 2 option(s):

ROC curve

Random forest diagram

Box plot

Scatter plot

Venn diagram

Q40. You create a multi-class image classification deep learning model that uses a set of labeled images. You create a script file named train.py that uses the PyTorch

1.3 framework to train the model.
You must run the script by using an estimator. The code must not require any additional Python libraries to be installed in the environment for the estimator. The time required for model training must be minimized.
You need to define the estimator that will be used to run the script.
Which estimator type should you use?

Select 1 option(s):

TensorFlow

PyTorch

Estimator

SKLearn

Q41. You are analyzing a dataset by using Azure Machine Learning Studio.

You need to generate a statistical summary that contains the p-value and the unique count for each feature column.
Which two modules can you use? Each correct answer presents a complete solution.
NOTE: Each correct selection is worth one point.

Select 2 option(s):

Computer Linear Correlation

Convert to Indicator Values

Summarize Data

Execute Python Script

Export Count Table

Q42. You plan to deliver a hands-on workshop to several students. The workshop will focus on creating data visualizations using Python. Each student will use a device that has internet access.

Student devices are not configured for Python development. Students do not have administrator access to install software on their devices. Azure subscriptions are not available for students.
You need to ensure that students can run Python-based data visualization code.
Which Azure tool should you use?

Select 1 option(s):

Azure Notebooks

Azure Machine Learning Service

Anaconda Data Science Platform

Azure BatchAl

Q43. You are performing a filter-based feature selection for a dataset to build a multi-class classifier by using Azure Machine Learning Studio.

The dataset contains categorical features that are highly correlated to the output label column.
You need to select the appropriate feature scoring statistical method to identify the key predictors.
Which method should you use?

Select 1 option(s):

Kendall correlation

Pearson correlation

Chi-squared

Spearman correlation

Q44. Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.

After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You are using Azure Machine Learning Studio to perform feature engineering on a dataset.
You need to normalize values to produce a feature column grouped into bins.
Solution: Apply an Entropy Minimum Description Length (MDL) binning mode.
Does the solution meet the goal?

Select 1 option(s):

Yes

Q45. HOTSPOT –

You plan to preprocess text from CSV files. You load the Azure Machine Learning Studio default stop words list.

You need to configure the Preprocess Text module to meet the following requirements:

? Ensure that multiple related words from a single canonical form.
? Remove pipe characters from text.

Remove words to optimize information retrieval.

Which three options should you select? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Hot Area:

Select 1 option(s):

Detect sentences Remove numbers Remove special characters

Remove stop words Lemmatization Remove special characters

Detect sentences Lemmatization Remove special characters

Q46. Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.

After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You are creating a new experiment in Azure Machine Learning Studio.
One class has a much smaller number of observations than the other classes in the training set.
You need to select an appropriate data sampling strategy to compensate for the class imbalance.
Solution: You use the Scale and Reduce sampling mode.
Does the solution meet the goal?

Select 1 option(s):

Yes

Q47. Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.

After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You are analyzing a numerical dataset which contains missing values in several columns.
You must clean the missing values using an appropriate operation without affecting the dimensionality of the feature set.
You need to analyze a full dataset to include all values.
Solution: Replace each missing value using the Multiple Imputation by Chained Equations (MICE) method.
Does the solution meet the goal?

Select 1 option(s):

Yes

Q48. Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.

After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You are a data scientist using Azure Machine Learning Studio.
You need to normalize values to produce an output column into bins to predict a target column.
Solution: Apply a Quantiles normalization with a QuantileIndex normalization.
Does the solution meet the goal?

Select 1 option(s):

Yes

Q49. Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.

After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You are creating a new experiment in Azure Machine Learning Studio.
One class has a much smaller number of observations than the other classes in the training set.
You need to select an appropriate data sampling strategy to compensate for the class imbalance.
Solution: You use the Synthetic Minority Oversampling Technique (SMOTE) sampling mode.
Does the solution meet the goal?

Select 1 option(s):

Yes

Q50. You are solving a classification task.

k=0.5

k=10

k=1

k=0.9

Q51. HOTSPOT –

You have a dataset created for multiclass classification tasks that contains a normalized numerical feature set with 10,000 data points and 150 features.

You use 75 percent of the data points for training and 25 percent for testing. You are using the scikit-learn machine learning library in Python. You use X to denote the feature set and Y to denote class labels.

You create the following Python data frames:

You need to apply the Principal Component Analysis (PCA) method to reduce the dimensionality of the feature set to 10 features in both training and testing sets.

How should you complete the code segment? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Hot Area:

Select 1 option(s):

pca transform(x_test) PCA(n_components = 10)

PCA(n_components = 10) pca transform(x_test)

transform(x_test) PCA(n_components = 10) pca

Q52. Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.

After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You are analyzing a numerical dataset which contains missing values in several columns.
You must clean the missing values using an appropriate operation without affecting the dimensionality of the feature set.
You need to analyze a full dataset to include all values.
Solution: Use the Last Observation Carried Forward (LOCF) method to impute the missing data points.
Does the solution meet the goal?

Select 1 option(s):

Yes

Q53. HOTSPOT –

You create a binary classification model to predict whether a person has a disease.

You need to detect possible classification errors.

Which error type should you choose for each description? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Hot Area:

Select 1 option(s):

True Positives False Positives False Positives True Negatives

True Negatives True Negatives True Negatives True Negatives

True Positives True Negatives False Positives False Positives

True Positives True Positives False Positives False Positives

Q54. You are creating a new experiment in Azure Machine Learning Studio. You have a small dataset that has missing values in many columns. The data does not require the application of predictors for each column. You plan to use the Clean Missing Data.

You need to select a data cleaning method.
Which method should you use?

Select 1 option(s):

Normalization

Synthetic Minority Oversampling Technique (SMOTE)

Replace using Probabilistic PCA

Replace using MICE

Q55. You are evaluating a completed binary classification machine learning model.

You need to use the precision as the evaluation metric.
Which visualization should you use?

Select 1 option(s):

Violin plot

Box plot

Binary classification confusion matrix

Gradient descent

Q56. You use Azure Machine Learning Studio to build a machine learning experiment.

You need to divide data into two distinct datasets.
Which module should you use?

Select 1 option(s):

Assign Data to Clusters

Load Trained Model

Group Data into Bins

Split Data

Q57. DRAG DROP –

You have a dataset that contains over 150 features. You use the dataset to train a Support Vector Machine (SVM) binary classifier.

You need to use the Permutation Feature Importance module in Azure Machine Learning Studio to compute a set of feature importance scores for the dataset.

In which order should you perform the actions? To answer, move all actions from the list of actions to the answer area and arrange them in the correct order.

Select and Place:

Select 1 option(s):

Add a dataset to the experiment Add a Two-Class Support Vector Machine module to initialize the SVM classifier Add a Split Data module to create training and test datasets Add a Permutation Feature Importance module and connect the trained model and test dataset Set the Metric for measuring performance property to Classification - Accuracy and then run the experiment.

Add a Split Data module to create training and test datasets Add a Permutation Feature Importance module and connect the trained model and test dataset Set the Metric for measuring performance property to Classification - Accuracy and then run the experiment. Add a Two-Class Support Vector Machine module to initialize the SVM classifier Add a dataset to the experiment

Add a Two-Class Support Vector Machine module to initialize the SVM classifier Add a dataset to the experiment Add a Split Data module to create training and test datasets Add a Permutation Feature Importance module and connect the trained model and test dataset Set the Metric for measuring performance property to Classification - Accuracy and then run the experiment.

Q58. You are evaluating a completed binary classification machine learning model.

You need to use the precision as the evaluation metric.
Which visualization should you use?

Select 1 option(s):

Receiver Operating Characteristic (ROC) curve

violin plot

Gradient descent

Scatter plot

Q59. HOTSPOT –

You are using the Azure Machine Learning Service to automate hyperparameter exploration of your neural network classification model.

You must define the hyperparameter space to automatically tune hyperparameters using random sampling according to following requirements:

? The learning rate must be selected from a normal distribution with a mean value of 10 and a standard deviation of 3.
? Batch size must be 16, 32 and 64.
? Keep probabiliy must be a value selected from a uniform distribution between the range of 0.05 and 0.1.

You need to use the param_sampling method of the Python API for the Azure Machine Learning Service.

How should you complete the code segment? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Hot Area:

Select 1 option(s):

unifrm(10,3) choice(16,32,64) uniform(0.05, 0.1)

uniform(0.05, 0.1) choice(16,32,64) uniform(0.05, 0.1)

normal (10,3) choice(16,32,64) uniform(0.05, 0.1)

Q61. HOTSPOT –

You are performing feature scaling by using the scikit-learn Python library for x.1 x2, and x3 features.

Original and scaled data is shown in the following image.

Use the drop-down menus to select the answer choice that answers each question based on the information presented in the graphic.

NOTE: Each correct selection is worth one point.

Hot Area:

Select 1 option(s):

Min Max Scale Min Max Scale Normalizer

Normalizer Min Max Scale Normalizer

Standard Scaler Min Max Scale Normalizer

Q60. HOTSPOT –

You have a feature set containing the following numerical features: X, Y, and Z.

The Poisson correlation coefficient (r-value) of X, Y, and Z features is shown in the following image:

Use the drop-down menus to select the answer choice that answers each question based on the information presented in the graphic.

NOTE: Each correct selection is worth one point.

Hot Area:

Select 1 option(s):

0.859122 a positive linear relationship

-0.106276 no linear relationship

1 no linear relationship

Q62. You are creating a new Azure Machine Learning pipeline using the designer.

The pipeline must train a model using data in a comma-separated values (CSV) file that is published on a website. You have not created a dataset for this file.
You need to ingest the data from the CSV file into the designer pipeline using the minimal administrative effort.
Which module should you add to the pipeline in Designer?

Select 1 option(s):

Import Data

Enter Data Manually

Dataset

Convert to CSV

Q63. HOTSPOT –

You are analyzing the asymmetry in a statistical distribution.

The following image contains two density curves that show the probability distribution of two datasets.

Use the drop-down menus to select the answer choice that answers each question based on the information presented in the graphic.

NOTE: Each correct selection is worth one point.

Hot Area:

Select 1 option(s):

Negative skew Positive skew

Positive skew Negative skew

Negative skew Negative skew

Q64. You create a binary classification model.

You need to evaluate the model performance.
Which two metrics can you use? Each correct answer presents a complete solution.
NOTE: Each correct selection is worth one point.

Select 2 option(s):

relative absolute error

coefficient of determination

accuracy

mean absolute error

precision

Q65. You are building a binary classification model by using a supplied training set.

The training set is imbalanced between two classes.
You need to resolve the data imbalance.
What are three possible ways to achieve this goal? Each correct answer presents a complete solution.
NOTE: Each correct selection is worth one point.

Select 3 option(s):

Penalize the classification

Generate synthetic samples in the minority class

Use accuracy as the evaluation metric of the model

Resample the dataset using undersampling or oversampling

Normalize the training feature set