A data-engineering team is preparing an HVAC sensor data set for a machine-learning model. The CSV contains two relevant columns:
temperature_reading (float)
temp_unit (string with values "F", "C", or "K")
Before applying any scaling or imputation, the team must standardize every observation to Celsius so that downstream statistics (mean, variance, and distance-based metrics) are meaningful. Which transformation logic satisfies this requirement?
If temp_unit == "F": new_val = (temperature_reading - 32) * 5 / 9; If temp_unit == "K": new_val = temperature_reading - 273.15; Otherwise leave the value unchanged.
If temp_unit == "F": new_val = (temperature_reading - 32) * 9 / 5; If temp_unit == "K": new_val = temperature_reading + 273.15; Otherwise leave the value unchanged.
If temp_unit == "F": new_val = (temperature_reading + 32) * 5 / 9; If temp_unit == "K": new_val = temperature_reading - 273.15; Otherwise leave the value unchanged.
Delete all rows where temp_unit is "F" or "K" and proceed with scaling only the existing Celsius records.
The correct transformation logic uses the scientifically accepted, linear formulas to standardize all temperature readings to Celsius. This preserves the intervals between temperature measurements, which is crucial for downstream statistical analysis.
Fahrenheit to Celsius: The correct formula is C = (F - 32) * 5 / 9. An incorrect transformation might erroneously add 32 instead of subtracting it, or use the reciprocal slope (9 / 5), which is used for Celsius to Fahrenheit conversion.
Kelvin to Celsius: The correct formula is C = K - 273.15. An incorrect transformation might add 273.15, which is the formula for converting Celsius to Kelvin.
Another incorrect approach involves deleting rows that are not already in Celsius. This is poor data handling practice as it results in the loss of valuable data; the goal of data cleaning in this scenario is to standardize the data, not to filter it.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
Why is it important to standardize temperature readings to Celsius before scaling?
Open an interactive chat with Bash
What is the difference between Fahrenheit to Celsius and Celsius to Fahrenheit conversions?
Open an interactive chat with Bash
Why should rows with different temperature units not be deleted during preprocessing?