Winter Sale - Special Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: dpm65

Hot Vendors

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Databricks Certified Associate Developer for Apache Spark 3.0 Exam Questions and Answers

Questions 4

Which of the following code blocks returns a single-row DataFrame that only has a column corr which shows the Pearson correlation coefficient between columns predError and value in DataFrame

transactionsDf?

Options:

A.

transactionsDf.select(corr(["predError", "value"]).alias("corr")).first()

B.

transactionsDf.select(corr(col("predError"), col("value")).alias("corr")).first()

C.

transactionsDf.select(corr(predError, value).alias("corr"))

D.

transactionsDf.select(corr(col("predError"), col("value")).alias("corr"))

(Correct)

E.

transactionsDf.select(corr("predError", "value"))

Buy Now
Questions 5

The code block shown below should write DataFrame transactionsDf to disk at path csvPath as a single CSV file, using tabs (\t characters) as separators between columns, expressing missing

values as string n/a, and omitting a header row with column names. Choose the answer that correctly fills the blanks in the code block to accomplish this.

transactionsDf.__1__.write.__2__(__3__, " ").__4__.__5__(csvPath)

Options:

A.

1. coalesce(1)

2. option

3. "sep"

4. option("header", True)

5. path

B.

1. coalesce(1)

2. option

3. "colsep"

4. option("nullValue", "n/a")

5. path

C.

1. repartition(1)

2. option

3. "sep"

4. option("nullValue", "n/a")

5. csv

(Correct)

D.

1. csv

2. option

3. "sep"

4. option("emptyValue", "n/a")

5. path

1. repartition(1)

2. mode

3. "sep"

4. mode("nullValue", "n/a")

5. csv

Buy Now
Questions 6

The code block displayed below contains an error. The code block should use Python method find_most_freq_letter to find the letter present most in column itemName of DataFrame itemsDf and

return it in a new column most_frequent_letter. Find the error.

Code block:

1. find_most_freq_letter_udf = udf(find_most_freq_letter)

2. itemsDf.withColumn("most_frequent_letter", find_most_freq_letter("itemName"))

Options:

A.

Spark is not using the UDF method correctly.

B.

The UDF method is not registered correctly, since the return type is missing.

C.

The "itemName" expression should be wrapped in col().

D.

UDFs do not exist in PySpark.

E.

Spark is not adding a column.

Buy Now
Questions 7

Which of the following code blocks applies the Python function to_limit on column predError in table transactionsDf, returning a DataFrame with columns transactionId and result?

Options:

A.

1.spark.udf.register("LIMIT_FCN", to_limit)

2.spark.sql("SELECT transactionId, LIMIT_FCN(predError) AS result FROM transactionsDf")

(Correct)

B.

1.spark.udf.register("LIMIT_FCN", to_limit)

2.spark.sql("SELECT transactionId, LIMIT_FCN(predError) FROM transactionsDf AS result")

C.

1.spark.udf.register("LIMIT_FCN", to_limit)

2.spark.sql("SELECT transactionId, to_limit(predError) AS result FROM transactionsDf")

spark.sql("SELECT transactionId, udf(to_limit(predError)) AS result FROM transactionsDf")

D.

1.spark.udf.register(to_limit, "LIMIT_FCN")

2.spark.sql("SELECT transactionId, LIMIT_FCN(predError) AS result FROM transactionsDf")

Buy Now
Questions 8

Which of the following describes the characteristics of accumulators?

Options:

A.

Accumulators are used to pass around lookup tables across the cluster.

B.

All accumulators used in a Spark application are listed in the Spark UI.

C.

Accumulators can be instantiated directly via the accumulator(n) method of the pyspark.RDD module.

D.

Accumulators are immutable.

E.

If an action including an accumulator fails during execution and Spark manages to restart the action and complete it successfully, only the successful attempt will be counted in the accumulator.

Buy Now
Questions 9

Which of the following code blocks performs an inner join between DataFrame itemsDf and DataFrame transactionsDf, using columns itemId and transactionId as join keys, respectively?

Options:

A.

itemsDf.join(transactionsDf, "inner", itemsDf.itemId == transactionsDf.transactionId)

B.

itemsDf.join(transactionsDf, itemId == transactionId)

C.

itemsDf.join(transactionsDf, itemsDf.itemId == transactionsDf.transactionId, "inner")

D.

itemsDf.join(transactionsDf, "itemsDf.itemId == transactionsDf.transactionId", "inner")

E.

itemsDf.join(transactionsDf, col(itemsDf.itemId) == col(transactionsDf.transactionId))

Buy Now
Questions 10

The code block displayed below contains an error. The code block is intended to write DataFrame transactionsDf to disk as a parquet file in location /FileStore/transactions_split, using column

storeId as key for partitioning. Find the error.

Code block:

transactionsDf.write.format("parquet").partitionOn("storeId").save("/FileStore/transactions_split")A.

Options:

A.

The format("parquet") expression is inappropriate to use here, "parquet" should be passed as first argument to the save() operator and "/FileStore/transactions_split" as the second argument.

B.

Partitioning data by storeId is possible with the partitionBy expression, so partitionOn should be replaced by partitionBy.

C.

Partitioning data by storeId is possible with the bucketBy expression, so partitionOn should be replaced by bucketBy.

D.

partitionOn("storeId") should be called before the write operation.

E.

The format("parquet") expression should be removed and instead, the information should be added to the write expression like so: write("parquet").

Buy Now
Questions 11

Which of the following code blocks returns only rows from DataFrame transactionsDf in which values in column productId are unique?

Options:

A.

transactionsDf.distinct("productId")

B.

transactionsDf.dropDuplicates(subset=["productId"])

C.

transactionsDf.drop_duplicates(subset="productId")

D.

transactionsDf.unique("productId")

E.

transactionsDf.dropDuplicates(subset="productId")

Buy Now
Questions 12

The code block shown below should add column transactionDateForm to DataFrame transactionsDf. The column should express the unix-format timestamps in column transactionDate as string

type like Apr 26 (Sunday). Choose the answer that correctly fills the blanks in the code block to accomplish this.

transactionsDf.__1__(__2__, from_unixtime(__3__, __4__))

Options:

A.

1. withColumn

2. "transactionDateForm"

3. "MMM d (EEEE)"

4. "transactionDate"

B.

1. select

2. "transactionDate"

3. "transactionDateForm"

4. "MMM d (EEEE)"

C.

1. withColumn

2. "transactionDateForm"

3. "transactionDate"

4. "MMM d (EEEE)"

D.

1. withColumn

2. "transactionDateForm"

3. "transactionDate"

4. "MM d (EEE)"

E.

1. withColumnRenamed

2. "transactionDate"

3. "transactionDateForm"

4. "MM d (EEE)"

Buy Now
Questions 13

Which of the following code blocks returns all unique values across all values in columns value and productId in DataFrame transactionsDf in a one-column DataFrame?

Options:

A.

tranactionsDf.select('value').join(transactionsDf.select('productId'), col('value')==col('productId'), 'outer')

B.

transactionsDf.select(col('value'), col('productId')).agg({'*': 'count'})

C.

transactionsDf.select('value', 'productId').distinct()

D.

transactionsDf.select('value').union(transactionsDf.select('productId')).distinct()

E.

transactionsDf.agg({'value': 'collect_set', 'productId': 'collect_set'})

Buy Now
Questions 14

Which of the following code blocks reads the parquet file stored at filePath into DataFrame itemsDf, using a valid schema for the sample of itemsDf shown below?

Sample of itemsDf:

1.+------+-----------------------------+-------------------+

2.|itemId|attributes |supplier |

3.+------+-----------------------------+-------------------+

4.|1 |[blue, winter, cozy] |Sports Company Inc.|

5.|2 |[red, summer, fresh, cooling]|YetiX |

6.|3 |[green, summer, travel] |Sports Company Inc.|

7.+------+-----------------------------+-------------------+

Options:

A.

1.itemsDfSchema = StructType([

2. StructField("itemId", IntegerType()),

3. StructField("attributes", StringType()),

4. StructField("supplier", StringType())])

5.

6.itemsDf = spark.read.schema(itemsDfSchema).parquet(filePath)

B.

1.itemsDfSchema = StructType([

2. StructField("itemId", IntegerType),

3. StructField("attributes", ArrayType(StringType)),

4. StructField("supplier", StringType)])

5.

6.itemsDf = spark.read.schema(itemsDfSchema).parquet(filePath)

C.

1.itemsDf = spark.read.schema('itemId integer, attributes , supplier string').parquet(filePath)

D.

1.itemsDfSchema = StructType([

2. StructField("itemId", IntegerType()),

3. StructField("attributes", ArrayType(StringType())),

4. StructField("supplier", StringType())])

5.

6.itemsDf = spark.read.schema(itemsDfSchema).parquet(filePath)

E.

1.itemsDfSchema = StructType([

2. StructField("itemId", IntegerType()),

3. StructField("attributes", ArrayType([StringType()])),

4. StructField("supplier", StringType())])

5.

6.itemsDf = spark.read(schema=itemsDfSchema).parquet(filePath)

Buy Now
Questions 15

Which of the following code blocks reads JSON file imports.json into a DataFrame?

Options:

A.

spark.read().mode("json").path("/FileStore/imports.json")

B.

spark.read.format("json").path("/FileStore/imports.json")

C.

spark.read("json", "/FileStore/imports.json")

D.

spark.read.json("/FileStore/imports.json")

E.

spark.read().json("/FileStore/imports.json")

Buy Now
Questions 16

Which of the following is the deepest level in Spark's execution hierarchy?

Options:

A.

Job

B.

Task

C.

Executor

D.

Slot

E.

Stage

Buy Now
Questions 17

Which of the following describes the difference between client and cluster execution modes?

Options:

A.

In cluster mode, the driver runs on the worker nodes, while the client mode runs the driver on the client machine.

B.

In cluster mode, the driver runs on the edge node, while the client mode runs the driver in a worker node.

C.

In cluster mode, each node will launch its own executor, while in client mode, executors will exclusively run on the client machine.

D.

In client mode, the cluster manager runs on the same host as the driver, while in cluster mode, the cluster manager runs on a separate node.

E.

In cluster mode, the driver runs on the master node, while in client mode, the driver runs on a virtual machine in the cloud.

Buy Now
Questions 18

Which of the following code blocks returns the number of unique values in column storeId of DataFrame transactionsDf?

Options:

A.

transactionsDf.select("storeId").dropDuplicates().count()

B.

transactionsDf.select(count("storeId")).dropDuplicates()

C.

transactionsDf.select(distinct("storeId")).count()

D.

transactionsDf.dropDuplicates().agg(count("storeId"))

E.

transactionsDf.distinct().select("storeId").count()

Buy Now
Questions 19

Which of the following code blocks creates a new 6-column DataFrame by appending the rows of the 6-column DataFrame yesterdayTransactionsDf to the rows of the 6-column DataFrame

todayTransactionsDf, ignoring that both DataFrames have different column names?

Options:

A.

union(todayTransactionsDf, yesterdayTransactionsDf)

B.

todayTransactionsDf.unionByName(yesterdayTransactionsDf, allowMissingColumns=True)

C.

todayTransactionsDf.unionByName(yesterdayTransactionsDf)

D.

todayTransactionsDf.concat(yesterdayTransactionsDf)

E.

todayTransactionsDf.union(yesterdayTransactionsDf)

Buy Now
Questions 20

Which of the following statements about Spark's execution hierarchy is correct?

Options:

A.

In Spark's execution hierarchy, a job may reach over multiple stage boundaries.

B.

In Spark's execution hierarchy, manifests are one layer above jobs.

C.

In Spark's execution hierarchy, a stage comprises multiple jobs.

D.

In Spark's execution hierarchy, executors are the smallest unit.

E.

In Spark's execution hierarchy, tasks are one layer above slots.

Buy Now
Questions 21

Which of the following code blocks reorders the values inside the arrays in column attributes of DataFrame itemsDf from last to first one in the alphabet?

1.+------+-----------------------------+-------------------+

2.|itemId|attributes |supplier |

3.+------+-----------------------------+-------------------+

4.|1 |[blue, winter, cozy] |Sports Company Inc.|

5.|2 |[red, summer, fresh, cooling]|YetiX |

6.|3 |[green, summer, travel] |Sports Company Inc.|

7.+------+-----------------------------+-------------------+

Options:

A.

itemsDf.withColumn('attributes', sort_array(col('attributes').desc()))

B.

itemsDf.withColumn('attributes', sort_array(desc('attributes')))

C.

itemsDf.withColumn('attributes', sort(col('attributes'), asc=False))

D.

itemsDf.withColumn("attributes", sort_array("attributes", asc=False))

E.

itemsDf.select(sort_array("attributes"))

Buy Now
Questions 22

The code block displayed below contains an error. The code block should configure Spark to split data in 20 parts when exchanging data between executors for joins or aggregations. Find the error.

Code block:

spark.conf.set(spark.sql.shuffle.partitions, 20)

Options:

A.

The code block uses the wrong command for setting an option.

B.

The code block sets the wrong option.

C.

The code block expresses the option incorrectly.

D.

The code block sets the incorrect number of parts.

E.

The code block is missing a parameter.

Buy Now
Questions 23

Which of the following code blocks returns a DataFrame with approximately 1,000 rows from the 10,000-row DataFrame itemsDf, without any duplicates, returning the same rows even if the code

block is run twice?

Options:

A.

itemsDf.sampleBy("row", fractions={0: 0.1}, seed=82371)

B.

itemsDf.sample(fraction=0.1, seed=87238)

C.

itemsDf.sample(fraction=1000, seed=98263)

D.

itemsDf.sample(withReplacement=True, fraction=0.1, seed=23536)

E.

itemsDf.sample(fraction=0.1)

Buy Now
Questions 24

Which of the following code blocks reads in the JSON file stored at filePath as a DataFrame?

Options:

A.

spark.read.json(filePath)

B.

spark.read.path(filePath, source="json")

C.

spark.read().path(filePath)

D.

spark.read().json(filePath)

E.

spark.read.path(filePath)

Buy Now
Questions 25

The code block displayed below contains an error. The code block should count the number of rows that have a predError of either 3 or 6. Find the error.

Code block:

transactionsDf.filter(col('predError').in([3, 6])).count()

Options:

A.

The number of rows cannot be determined with the count() operator.

B.

Instead of filter, the select method should be used.

C.

The method used on column predError is incorrect.

D.

Instead of a list, the values need to be passed as single arguments to the in operator.

E.

Numbers 3 and 6 need to be passed as string variables.

Buy Now
Questions 26

The code block displayed below contains at least one error. The code block should return a DataFrame with only one column, result. That column should include all values in column value from

DataFrame transactionsDf raised to the power of 5, and a null value for rows in which there is no value in column value. Find the error(s).

Code block:

1.from pyspark.sql.functions import udf

2.from pyspark.sql import types as T

3.

4.transactionsDf.createOrReplaceTempView('transactions')

5.

6.def pow_5(x):

7. return x**5

8.

9.spark.udf.register(pow_5, 'power_5_udf', T.LongType())

10.spark.sql('SELECT power_5_udf(value) FROM transactions')

Options:

A.

The pow_5 method is unable to handle empty values in column value and the name of the column in the returned DataFrame is not result.

B.

The returned DataFrame includes multiple columns instead of just one column.

C.

The pow_5 method is unable to handle empty values in column value, the name of the column in the returned DataFrame is not result, and the SparkSession cannot access the transactionsDf

DataFrame.

D.

The pow_5 method is unable to handle empty values in column value, the name of the column in the returned DataFrame is not result, and Spark driver does not call the UDF function

appropriately.

E.

The pow_5 method is unable to handle empty values in column value, the UDF function is not registered properly with the Spark driver, and the name of the column in the returned DataFrame is

not result.

Buy Now
Questions 27

Which of the following code blocks returns a 2-column DataFrame that shows the distinct values in column productId and the number of rows with that productId in DataFrame transactionsDf?

Options:

A.

transactionsDf.count("productId").distinct()

B.

transactionsDf.groupBy("productId").agg(col("value").count())

C.

transactionsDf.count("productId")

D.

transactionsDf.groupBy("productId").count()

E.

transactionsDf.groupBy("productId").select(count("value"))

Buy Now
Exam Name: Databricks Certified Associate Developer for Apache Spark 3.0 Exam
Last Update: Nov 19, 2024
Questions: 180

PDF + Testing Engine

$56  $159.99

Testing Engine

$42  $119.99
buy now Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 testing engine

PDF (Q&A)

$35  $99.99
buy now Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 pdf
dumpsmate guaranteed to pass
24/7 Customer Support

DumpsMate's team of experts is always available to respond your queries on exam preparation. Get professional answers on any topic of the certification syllabus. Our experts will thoroughly satisfy you.

Site Secure

mcafee secure

TESTED 21 Nov 2024