A data-science team must deploy an automatic summarization service for customer-support incident tickets. The business and compliance rules require that every summary must be no longer than 30 words and must paraphrase the ticket instead of copying sentences to prevent exposing sensitive customer information. The team has about 100,000 English ticket-to-summary pairs produced by analysts available for supervised learning, and the company can run models on a GPU cluster. Within the next quarter, the summaries must also work for Spanish and French with minimal additional annotation effort. Finally, during system acceptance, reviewers will value semantic equivalence with analyst summaries more than exact n-gram overlap.
Which combination of modeling approach and primary automatic evaluation metric best satisfies these requirements?
Apply the unsupervised TextRank algorithm to extract top-ranked sentences and evaluate with ROUGE-1.
Train a skip-gram Word2Vec model to identify key phrases and measure precision-recall of the extracted phrases.
Use a simple Lead-3 extraction baseline and report the compression ratio of tokens as the main metric.
Fine-tune a pre-trained multilingual encoder-decoder Transformer (e.g., mBART/mT5) for abstractive summarization and evaluate with BERTScore.
A multilingual encoder-decoder Transformer such as mBART or mT5 can be fine-tuned on the existing English ticket-summary pairs and then transferred to Spanish and French with limited or even zero additional labels, meeting the cross-language requirement. Because it generates summaries word-by-word, it can paraphrase content instead of copying whole sentences, satisfying the compliance rule.
For evaluation, BERTScore compares summaries in contextual embedding space rather than by n-gram overlap, so it rewards semantic equivalence even when the wording changes-exactly what reviewers care about.
The other options fail at least one constraint:
TextRank and Lead-3 are extractive and likely copy sensitive sentences verbatim, and ROUGE-1 or compression ratio emphasize surface overlap, not meaning.
Word2Vec key-phrase extraction neither produces coherent 30-word prose nor offers an accepted summary-quality metric.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is a multilingual encoder-decoder Transformer like mBART or mT5?
Open an interactive chat with Bash
How does BERTScore evaluate semantic equivalence?
Open an interactive chat with Bash
Why is fine-tuning better than unsupervised methods like TextRank for this task?