Free Databricks (Databricks-Certified-Professional-Data-Engineer) Certification Sample Questions with Online Practice Test [Q94-Q114]

5/5 - (3 votes)

Free Databricks (Databricks-Certified-Professional-Data-Engineer) Certification Sample Questions with Online Practice Test

Databricks-Certified-Professional-Data-Engineer  Certification Study Guide Pass Databricks-Certified-Professional-Data-Engineer Fast

Databricks Certified Professional Data Engineer (DCPDE) is a certification program designed to validate the skills and knowledge of data professionals on the Databricks platform. Databricks Certified Professional Data Engineer Exam certification is aimed at professionals who design, build, and maintain data processing systems using Apache Spark and Databricks. The DCPDE certification demonstrates a comprehensive understanding of the Databricks platform and the ability to design and implement data processing solutions using Spark.

 

QUESTION 94
You are asked to setup two tasks in a databricks job, the first task runs a notebook to download the data from a remote system, and the second task is a DLT pipeline that can process this data, how do you plan to configure this in Jobs UI

 
 
 
 
 

QUESTION 95
Which of the following describes how Databricks Repos can help facilitate CI/CD workflows on the Databricks Lakehouse Platform?

 
 
 
 
 

QUESTION 96
A data engineering team is in the process of converting their existing data pipeline to utilize Auto Loader for
incremental processing in the ingestion of JSON files. One data engineer comes across the following code
block in the Auto Loader documentation:
1. (streaming_df = spark.readStream.format(“cloudFiles”)
2. .option(“cloudFiles.format”, “json”)
3. .option(“cloudFiles.schemaLocation”, schemaLocation)
4. .load(sourcePath))
Assuming that schemaLocation and sourcePath have been set correctly, which of the following changes does
the data engineer need to make to convert this code block to use Auto Loader to ingest the data?

 
 
 
 
 

QUESTION 97
The data analyst team had put together queries that identify items that are out of stock based on orders and replenishment but when they run all together for final output the team noticed it takes a really long time, you were asked to look at the reason why queries are running slow and identify steps to improve the performance and when you looked at it you noticed all the code queries are running sequentially and using a SQL endpoint cluster. Which of the following steps can be taken to resolve the issue?
Here is the example query
1.— Get order summary
2.create or replace table orders_summary
3.as
4.select product_id, sum(order_count) order_count
5.from
6. (
7. select product_id,order_count from orders_instore
8. union all
9. select product_id,order_count from orders_online
10. )
11.group by product_id
12.– get supply summary
13.create or repalce tabe supply_summary
14.as
15.select product_id, sum(supply_count) supply_count
16.from supply
17.group by product_id
18.
19.– get on hand based on orders summary and supply summary
20.
21.with stock_cte
22.as (
23.select nvl(s.product_id,o.product_id) as product_id,
24. nvl(supply_count,0) – nvl(order_count,0) as on_hand
25.from supply_summary s
26.full outer join orders_summary o
27. on s.product_id = o.product_id
28.)
29.select *
30.from
31.stock_cte
32.where on_hand = 0

 
 
 
 
 

QUESTION 98
You would like to build a spark streaming process to read from a Kafka queue and write to a Delta table every
15 minutes, what is the correct trigger option

 
 
 
 
 

QUESTION 99
A new data engineer has started at a company. The data engineer has recently been added to the company’s
Databricks workspace as [email protected]. The data engineer needs to be able to query the table
sales in the database retail. The new data engineer already has been granted USAGE on the database retail.
Which of the following commands can be used to grant the appropriate permissions to the new data engineer?

 
 
 
 
 

QUESTION 100
Which of the following tool provides Data Access control, Access Audit, Data Lineage, and Data discovery?

 
 
 
 
 

QUESTION 101
A Delta Live Table pipeline includes two datasets defined using STREAMING LIVE TABLE.
Three datasets are defined against Delta Lake table sources using LIVE TABLE . The table is configured to
run in Development mode using the Triggered Pipeline Mode.
Assuming previously unprocessed data exists and all definitions are valid, what is the expected outcome after
clicking Start to update the pipeline?

 
 
 
 
 

QUESTION 102
You are noticing job cluster is taking 6 to 8 mins to start which is delaying your job to finish on time, what steps you can take to reduce the amount of time cluster startup time

 
 
 
 
 

QUESTION 103
A junior data engineer needs to create a Spark SQL table my_table for which Spark manages both the data and
the metadata. The metadata and data should also be stored in the Databricks Filesystem (DBFS).
Which of the following commands should a senior data engineer share with the junior data engineer to
complete this task?

 
 
 
 
 

QUESTION 104
What is the type of table created when you issue SQL DDL command CREATE TABLE sales (id int, units int)

 
 
 
 
 

QUESTION 105
How does Lakehouse replace the dependency on using Data lakes and Data warehouses in a Data and Analytics solution?

 
 
 
 
 

QUESTION 106
How do you check the location of an existing schema in Delta Lake?

 
 
 
 

QUESTION 107
How do you create a delta live tables pipeline and deploy using DLT UI?

 
 
 
 
 

QUESTION 108
you are currently working on creating a spark stream process to read and write in for a one-time micro batch, and also rewrite the existing target table, fill in the blanks to complete the below command sucesfully.
1.spark.table(“source_table”)
2..writeStream
3..option(“____”, “dbfs:/location/silver”)
4..outputMode(“____”)
5..trigger(Once=____)
6..table(“target_table”)

 
 
 
 
 

QUESTION 109
What is the purpose of the bronze layer in a Multi-hop architecture?

 
 
 
 
 

QUESTION 110
A data engineering team has been using a Databricks SQL query to monitor the performance of an ELT job.
The ELT job is triggered by a specific number of input records being ready to process. The Databricks SQL
query returns the number of minutes since the job’s most recent runtime.
Which of the following approaches can enable the data engineering team to be notified if the ELT job has not
been run in an hour?

 
 
 
 
 

QUESTION 111
Which of the following command can be used to drop a managed delta table and the underlying files in the storage?

 
 
 
 
 

QUESTION 112
At the end of the inventory process, a file gets uploaded to the cloud object storage, you are asked to build a process to ingest data which of the following method can be used to ingest the data in-crementally, schema of the file is expected to change overtime ingestion process should be able to handle these changes automatically.
Below is the auto loader to command to load the data, fill in the blanks for successful execution of below code.
1.spark.readStream
2..format(“cloudfiles”)
3..option(“_______”,”csv)
4..option(“_______”, ‘dbfs:/location/checkpoint/’)
5..load(data_source)
6..writeStream
7..option(“_______”,’ dbfs:/location/checkpoint/’)
8..option(“_______”, “true”)
9..table(table_name))

 
 
 
 
 

QUESTION 113
You are working on a process to load external CSV files into a delta table by leveraging the COPY INTO command, but after running the command for the second time no data was loaded into the table name, why is that?
1.COPY INTO table_name
2.FROM ‘dbfs:/mnt/raw/*.csv’
3.FILEFORMAT = CSV

 
 
 
 
 

QUESTION 114
What is the purpose of a silver layer in Multi hop architecture?

 
 
 
 
 

Get Perfect Results with Premium Databricks-Certified-Professional-Data-Engineer Dumps Updated 220 Questions: https://www.passtestking.com/Databricks/Databricks-Certified-Professional-Data-Engineer-practice-exam-dumps.html

admin

Leave a Reply

Your email address will not be published. Required fields are marked *

Enter the text from the image below
 

Post comment