A data scientist must design a deep-learning model that can ingest a variable-length sequence of hourly power-grid sensor readings and forecast the next hour's demand. After discovering that a feed-forward network ignores temporal context, the scientist proposes using a recurrent neural network (RNN) instead. Which architectural property of an RNN specifically enables it to learn dependencies across the entire sequence without introducing a separate set of weights for every time step?
It carries a hidden-state vector forward and reuses the same weight matrices at every time step, allowing information from earlier inputs to influence later outputs.
It applies convolutional kernels that slide over the sequence to detect local n-gram patterns regardless of sequence length.
It adds positional encodings to embeddings so self-attention can model order without any form of recurrence.
It trains a unique set of weight matrices for each time step so temporal order is preserved explicitly.
An RNN contains a recurrent unit that maintains a hidden-state vector. At each time step, the network updates this hidden state using the current input and the same weight matrices that were used at previous steps. Because these parameters are shared across the entire sequence, the model can propagate information forward in time, capturing short- and long-range temporal dependencies while keeping the parameter count fixed.
The other options describe mechanisms not responsible for an RNN's ability to handle sequential dependencies:
Convolutional filters sliding over a sequence characterize CNNs, which capture local patterns but do not maintain a hidden memory across arbitrary lengths.
Training distinct weight matrices for every time step would dramatically increase parameters and is the opposite of what RNNs do.
Positional encodings with self-attention are used in transformer models, which eliminate recurrence rather than rely on it.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is a hidden-state vector in an RNN?
Open an interactive chat with Bash
How does an RNN avoid having a unique weight matrix for each time step?