On a production Linux server, a Python 3.12 script that parses a multi-gigabyte log file is repeatedly killed by the OOM killer. A simplified version of the offending function is:
def load_logs(path):
with open(path) as f:
return [line.strip() for line in f]
The rest of the program processes each line only once and never needs the whole list in memory. Which modification will most directly prevent the excessive memory usage while keeping the code idiomatic and easy to maintain?
Keep the function as is but slice the list into 1 000-line chunks before processing.
Add import gc and call gc.collect() immediately after process(logs) to force garbage collection.
Open the file in binary mode to reduce per-line memory: with open(path, 'rb') as f:
Rewrite the function as a generator:
def load_logs(path):
with open(path) as f:
for line in f:
yield line.strip()
The original function builds an entire list in memory, so the process size grows with the size of the log file. Converting the function to a generator that yields one line at a time streams the data; only a single line is held in memory during processing. Calling gc.collect() cannot reclaim a list that is still referenced, changing to binary mode does not stop the list from being built, and slicing the list after it is created still requires holding the full list first. A generator function therefore addresses the root cause.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is a generator in Python and why is it useful?
Open an interactive chat with Bash
What is the OOM killer in Linux?
Open an interactive chat with Bash
How does changing the function to use `yield` solve the memory issue?
Open an interactive chat with Bash
CompTIA Linux+ XK0-006 (V8)
Automation, Orchestration, and Scripting
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
IT & Cybersecurity Package Join Premium for Full Access