video being annotated

Problem discussion

On video summaries

Jun 25, 2024

CEO Mohammed

Mohammed Kheezar Hayat

Product

What summaries can and cannot do

When we run ads on Google search, it tells us what terms the users who clicked on the ad, searched for. And it seems that people who ‘stumble’ upon Afterword through Google search are mostly just searching for one thing: ‘video summarizer’.

In fact beyond this, 40% of our summaries are youtube videos. While we would like to think that our users are all summarising scientific papers and long books, turns out most of you just really like long videos, especially podcasts. Just apparently not enough to listen to the whole things!

Now if you are on Afterword and reading this blog, the terms ‘video summarisation’ or ‘video summarizer’ probably do not seem unusual to you. You have quite likely used this feature or are already making a mental note to do so.

But that term, as we understand it, probably only started making the kind of sense it now does only recently. Even in 2015, most people would have understood ‘video summary’ as a video that summarises some other more ‘serious’ medium, like a book. ‘Videos’ on the internet, particularly YouTube hadn’t broken through into the mainstream as anything other than entertainment until recently. And after all who summarises music videos?

The spoken word as reference, as knowledge is a recent entity, outside of lectures. We now treat conversation between experts (with which YouTube is full) almost as lectures; and proper lectures themselves are also very popular there. Obviously platform like YouTube have and incentive to create FOMO and keep you watching. Since you have more interests than time, you need some form of shortening, skimming; hence the need for video summaries.

There is a problem though. Video lectures and podcasts are often devoid of any context. Unlike a live lecture, there is no course structure surrounding it. If it is a conversation style podcast, there is a risk that you mistake the conversationists’ agreement and understanding for your own. Humans process the content conversation very differently when they are mere spectators to it. You can become aware of ideas through a screen; but it’s much easier to really ‘get’ things when they are explained in a personal conversation. When we talk to people directly it is way easier to fit the new ideas into our existing intellectual and emotional maps.

An intelligent and honest conversation, especially with a person whose experiences are at the ‘edge’ of your own, is a magical thing. It can have an impact that is, in my opinion, worth reading a dozen books. Ever found yourself saying: “Thanks, I needed to hear that”? Even when you already knew whatever ‘that’ was. That’s it.

But when conversation does not actually involve you, these advantages of the spoken word can disappear. And I am not sure whether generated summaries, when they are similarly devoid of context and on their own, can be much help. Summarisation is not a substitute for understanding. There is no getting around that.

Summarisation can however make the conditions for understanding to arise more likely to happen. In our opinion, it is best to let AI summarisation as an arranger rather than explainer. Merely arrange ideas, not pick them, a subtle difference. Judgments have to be made (any compression must do that) but should be easily examinable. Ultimately the whole point of video cannot be gotten around.

But the landscape can be clarified by AI.

So summaries are a good place to start. Summaries are actually excellent start, as they serve many purposes, decider, starter, reference: you can build a knowledge map out of summaries. Hence our pitch to users of best summariser out there. Not necessarily the best summaries as such, but the best overall exp around them. And we have had that feedback too. People do watch after summarising instead of before. So you, dear users, do in fact, listen to the podcasts, but after reading the summaries!

Should we be even calling them summaries in that case. Aren’t they more like introductions? Or tables of contents, indexes [link to other post]. Or maybe they are like maps that you consult before venturing on a trip?

If these are maps (we like that metaphor), then all kinds of interesting possibilities open up. And a lot of UX issues ( like speed, users having to find the correct question to ask) with the currently dominant chat paradigm, appear to resolve elegantly.