This is my official video presentation for IROS 2020.
The 2020 International Conference on Intelligent Robots and Systems (IROS) will be virtual. It was planned to be in Las Vegas, Nevada, from October 25-29. While this was unfortunately expected, I understand the need to reduce large gatherings, as the pandemic is still happening here. I wish our government, and private citizens, could look around the world and see where things are going right regarding COVID-19; for example, Taiwan is having 10,000 person concerts and has all of seven recorded deaths as of today, while the United States still has heavy restrictions on in-person gatherings with well over 200,000 deaths (here’s the source I’ve been checking to track this information).
For IROS 2020, I am presenting a paper on robot fabric manipulation, done in collaboration with wonderful colleagues from Berkeley and Honda Research Institute. IROS 2020 asked us to create a 15-minute video for each paper, and my final product is shown above and also available on my YouTube channel. This is by far the longest pre-recorded video I have ever made for a conference. I believe it’s also my first video with audio. Normally, my research videos are just a handful of minutes long, and if I need to clarify things in the video, I add text (subtitles) manually in the iMovie application. For my IROS video, however, I wanted to make the video longer with audio, but I also knew I needed a more scalable way to add subtitles, which would be necessary for me to completely understand the video if I were to re-watch it many years later. I also wanted to add subtitles and to make them unavoidably visible to encourage other researchers to add subtitles to their videos.
Here is the backstory of how I made this video.
First, as part of my research that turned into this paper, I had many short video clips of a robot manipulating fabric in iMovie on my MacBook Pro laptop. I started a fresh iMovie file, and picked the robot videos that I wanted to include.
Then, I created a new Google Slides and a new Google Doc. In the Google Slides file, I created the slides that I wanted to show in the final video. These slides were mostly copied and pasted from earlier, internal research presentations, and reformatted to a consistent font and size style.
In the Google Doc, I wrote down my entire transcript, which turned out to be slightly over four pages. I then practiced my audio by stating what I wrote on the transcript, peppered with my usual enthusiasm. I also tried to avoid talking too fast. I used the voice Memos app on my iPhone to record audio. I made multiple audio files, each about one minute long. This made it simpler to redo any audio (which I had to do frequently) since I only had to redo small portions instead of the entire video’s audio.
Once I felt like the slides were ready, and that they aligned well with the audio, I put in each slide and audio file into iMovie, carefully adjusting the time ranges to align them, and to make sure the video did not exceed the 15-minute limit. I made further edits and improvements to the video after getting feedback from my colleagues. When I was sufficiently satisfied with the result, I saved and got an .mp4 video file.
But what about adding subtitles?
iMovie contains functionality for adding subtitles, but the process is manual and highly cumbersome. After some research, I found this video tutorial which demonstrates how to use Kapwing to add subtitles. Kapwing is entirely web-based, so there’s no need to download it locally – I can upload videos to their website and edit in a web browser.
I can add subtitles to Kapwing by uploading audio files, and Kapwing will use automatic speech recognition to generate an initial draft, which I then fine-tune. Here is the interface for adding subtitles:
I paid 20 USD for a monthly subscription so that I could create a longer video, and followed the tutorial mentioned earlier to add subtitles. Eventually, I got my 15-minute video, which just barely fit under the 50MB file limit as mandated by IROS. I uploaded it to the conference, as well as to YouTube, which is the one at the top of this post.
I am happy with the final video product. That said, the process of adding subtitles was not ideal:
The automatic speech recognition for producing an initial guess at the subtitles is … bad. I mean, really bad. I guess it got less than 5% of my audio correct, so in practice I was adding all of my subtitles by manually copying and pasting from my Google Doc. To put things in perspective, Google Meet (my go-to video conferencing tool these days) handles my audio far better, with subtitles that are remarkably highly quality.
The interface for subtitles is also cumbersome to use, though to be fair, it’s an improvement over iMovie. As shown in the screenshot above, when re-editing a video, it doesn’t seem to preserve the ordering of the subtitles (notice how my first line in the video is listed second above). Furthermore, when editing and then clicking “Done”, I sometimes saw subtitles with incorrect sizes, so I had to re-edit the video … only to see a few subtitles disappear each time I did this. There also did not seem to be a way to change the subtitle size for all subtitles simultaneously. My solution was to forget about saving in progress, and to painstakingly go through each subtitle to change the size by manually clicking via a drop-down menu.
I hope this was useful! It is likely that future conferences will continue to be virtual in some way. For example, I am attempting to submit several papers to ICRA 2021, which will be in Xi’an, China, next summer. The website says ICRA 2021 will be a hybrid event with a mix of virtual and in-person events, but I would bet that many travel restrictions will still be in place, particularly for researchers from the United States. For that, and several other reasons, I am almost certainly going to be a virtual attendee, so I may need to revisit these instructions when making additional video recordings.
As always, thank you for reading, stay safe, and wear a mask.