The Re-Invention Of Subtitling. A New Technical Approach To The International Market
“Pour bien faire, il faudrait un film qui tournerait la traduction intégrale (de la pièce) à la vitesse du débit des acteurs. Ce qui demanderait une installation compliquée. Tel qu’il est, le sous-titrage est nettement insuffisant.” — Elsa Triolet , Chroniques Théâtrale (1948-1951), Paris, Gallimard, 1981
Theatre subtitling is often considered independent of the performance, providing non-native speaking spectators minimal assistance in understanding performance dialogue.
For a long time, captioning in theatres was considered as an accessory. Twenty-five years ago translations of shows in foreign languages were reduced to a summary of the performance in the program, or an oral translation transmitted in a dull voice to the viewer using an earpiece or headphones. Times are changing and the ‘titlist’, as they are called, have become essential to the international dissemination of theatre. Subtitling is part of the overall economy of the performing arts. It is an economic sector based on a chain of tasks that range from translation and theatrical adaptation to staging, scenography and technical management.
In June 2016, we flew to Craiova with Serge Rangoni (General Manager – Théâtre de Liège) to meet with Alexandru Boureanu (General Manager, Teatrul Marin Sorescu – Craiova) and Adi Manescu (General Manager, INCESA – Craiova) to discuss potential technological solutions to enhance the theatre experience for non-native speakers. The discussion quickly focused on surtitles and the bad experiences resulting from non-synchronisation between the plays and the captions, black holes in the translation, limitation of available languages, etc. Together we reviewed the whole chain of tasks of the theatre subtitling process to explore and identify the zones for improvement.
Translating, dividing, and topping are the three main steps in subtitling. The translation is done beforehand using the original text. This is the major task of the subtitling process. We are still far from being able to devote theatrical translation to a machine or an algorithm, even though they are improving daily for everyday language. Dividing the text requires human-based experience and a detailed knowledge of the rhythm of the play. Topping is the final step where the ‘topper’ manually synchronizes the display of the subtitles on the available screens by endlessly pushing a button.
At this stage, we thought about improving the experience by developing new displays. Many of these solutions (augmented reality glasses, individual seat screens, mobile applications for smartphones) are in the development stage, but the problem is that they are still under control of the manual topping process. We, therefore, concluded that technology can improve the surtitle experience by managing the whole process of prompt alignment for a more accurate synchronization (both on the main screens above the stage or on a mobile application to be developed). The challenge was to find a solution for an automatic prompt alignment system that can synchronize the subtitles with the live performance. We thought we had only two technical solutions. The first is based on visual recognition of the play, matching the movement onstage and the scenography with the text displayed. The second is based on a vocal recognition system. The global idea was to develop a system able to understand the state of the play, to hear what the actors are saying onstage and transform it into automatic instructions for a software that displays the right prompt at the right time. It is not as simple as it seems!
With INCESA on board, we had the best partner for the software development. Back in Belgium, we contacted MULTITEL, a Belgian research center based in Mons, specializing in, among other things, voice recognition in noisy environments. The first meeting with Jean-Yves Parfait (Research Engineer/Team Leader – MULTITEL) and Alexandre Sokolow (Research Engineer – MULTITEL) was quite optimistic. First, they indicated that as we are in a niche market very dependent on a specific environment (theatre), and, we are miles away from the GAFAM competition in natural language processing for everyday life. The performing arts sector is specific and based on a cultural and human experience (translation is one of the most important), so it is possible to develop something truly innovative. As voice becomes a hot topic in the technology sector, we could probably benefit from research in this area and from the existing database to strengthen our solution. In this context, an effective fast alignment device can be developed under specific conditions. The main challenge was to create a robust voice acquisition process capable of solving the problems related to the noisy environment of the theatre.
In the meantime, we have developed a co-production process with Transquinquennal, a Belgian theatre company, formed by Miguel Decleire, Stephane Olivier and Bernard Breuse. The theatre company has a project, Idiomatic, a polyidiomatic (multi-linguistic) show for five actors coming from five different countries. Each actor speaks two languages – their mother tongue and a foreign language that is native for one of the other actors. The cast includes a German (Georg Peetz), a Slovenian (Andrej Zalesjak), a Belgian (Anna Galy), a Norwegian (Elisabeth Sand) and a Romanian (George-Albert Costea). The ‘machine’ developed for the surtitle project, can be the missing link between the actors and a super artificial intelligence that can help people to communicate without speaking the same language.
Every partner (artists, engineers, theatre producers, writers, and computer science researchers) participated in the first research meeting in December 2016. This meeting resulted in a research architecture based on a five-step development:
Block 1: Acquisition Audio and Audio processing (filtering)
Acquiring a clean sound for processing in the recognition engine
- Acquisition and direct filtering with close-talk microphones
- Acquisition with far-talk microphones
- Network of microphones – Beam Forming
Block 2: Prompt Recognition
Speech recognition after signal processing
Block 3: Prompt Alignment
Selection and display of the right prompt
Block 4: Global Programming
Interfacing of the different blocks
Block 5: Test and Validation
The first challenge was to develop a clear acquisition set up that can function in the theater venue environment. A first protocol was developed by the engineers based on four realistic setups and four “noisy situations”.
- Close-talk microphone
- Far-talk Omni-directional microphone
- Far-talk Shotgun microphone
- Far-talk Microphone array
(Spherical array with 16 microphones from CEDIA research Center /University of Liège)
- Audience noise
- Loudspeakers diffusing ambient sound (music, effect, )
- Close interferences (other actors speaking; music, )
We developed eight acquisition scenarios that have been independently tested with different speech recognition engines and noise suppression algorithms. After wide-ranging test sessions, engineers concluded that close-talk (Madonna) microphones showed a very high word recognition rate (WRR >95%) and adaptive filtering showed final performances close to noise-free conditions (WRR = ~98%). Furthermore, the microphone array, even if it is a good option for the future, was not efficient enough in this protocol due to a lack of adaptation to our specific theatrical environment.
On this basis, the second challenge was to synchronize the speech recognition results in a prompt alignment system.
The method used was ‘term frequency-inverse document frequency’ (TF-IDF), in which each output of the speech recognizer is a query text that needs to be classified among several documents (the handcrafted prompts). The TF-IDF protocol offers a likelihood of a prompt for each query. In real time, the protocol analyzes small pieces of speech and scans the prompt database to identify the right subtitle to display.
The likelihood itself is not sufficient to properly align a query to its relevant document. We have to take into account the temporal succession of each prompt through its transition probabilities. In other words, we defined a frame of possibilities (for example, 10 prompts before, 10 prompts after) to minimize the number of prompts to scan and the possible error between two prompts. With this method, the prompt alignment algorithm proved to be robust even if the Word Recognition Rate is lower.
This is where we are in our exploration of the ‘re-invention of subtitling’ in theatre. We have a path to success with close-talk microphones and the first full-scale test made in May 2018 produced very good results with more than 95% of perfect alignment. Furthermore, the integration of ‘the machine’ in Idiomatic (premiered in Craiova, May 2018 and in Oslo, June 2018) gave us highly useful feedback for the development of the final product.
The re-invention of subtitling is on its way and we are still on track to improve the first prototype with a more robust architecture, a frictionless user experience (by implementing beamforming technology) and, above all, a new full-scale experience and use case.
To be continued.
This article originally appeared in European Theatre Lab on September 6th, 2018 and has been reposted with permission.