A data scientist at a financial institution is tasked with creating a dataset for a targeted marketing campaign. The goal is to identify all customers who have made at least one transaction in the past year. The data scientist has two primary dataframes: customers, which contains all customer profiles (customer_id, name, join_date), and transactions, which contains all transactions from the last year (transaction_id, customer_id, amount). The transactions dataframe may contain records for customers who are no longer in the main customers dataframe due to account closure, and the customers dataframe contains many customers who have not transacted in the last year. Which type of join on the customer_id key should the data scientist use to generate a list that exclusively includes customers with transaction data?
A FULL OUTER JOIN between customers and transactions.
The correct answer is to use an INNER JOIN. An INNER JOIN returns only the records that have matching values in both tables based on the specified key. In this scenario, it will create a dataset containing only the customers who appear in both the customers and transactions dataframes, which directly satisfies the requirement of identifying customers who have made a transaction.
A LEFT JOIN is incorrect because it would return all records from the customers dataframe and the matched records from the transactions dataframe. Customers who have not made a transaction would be included in the result with NULL values for the transaction columns, failing to create the exclusive list required.
A RIGHT JOIN is incorrect because it would return all records from the transactions dataframe. This could include transactions from customers who have closed their accounts and are no longer in the customers dataframe, leading to incomplete or irrelevant records.
A FULL OUTER JOIN is incorrect as it returns all records when there is a match in either the left or the right table. This would create the largest and least relevant dataset by including all customers (even those with no transactions) and all transactions (even those from non-customers), requiring significant subsequent filtering.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is the purpose of an INNER JOIN in data analysis?
Open an interactive chat with Bash
How is an INNER JOIN different from a LEFT JOIN?
Open an interactive chat with Bash
Can you provide a real-world example of when to use a FULL OUTER JOIN?