Closed Captioning – a software driven approach
Introduction: Web pages, Images, Videos are all created,
uploaded and shared everywhere using online sites and services by well known
companies. However, voice as a medium of expression, content and publishing has
not found such popularity. This writeup briefly describes a software approach
to making speech content relevant to web searches via transcription and closed
captioning.
Description:
Data that is captured as speech or voice is usually an audio
file that is more difficult to search than text for similar reasons as images are
difficult to be searched when compared to text. The text representation of content whether
audio or video in the form of metadata is a lot more helpful because it conforms
to traditional format required for software based search. Native interpretation
of image and voice files requires pattern recognition and speech recognition
software respectively which are time consuming, costly and prone to failures. This
makes it difficult to perform at the same time as the data becomes available.
Consequently some out of band processing is involved to add relevant text data
to such difficult representations of data. When voice is streamed over the
internet as part of video, closed captioning packets can be overlaid over the
data to render it at the users’ viewing device. This textual content can also
be archived and searched just the same way as we search web pages or documents
on a users’ computer. The same holds true for transcription and captioning
services. Some captioning tools such as Camtasia, Captionate, Express Scribe,
MAGpie2, Overstream allow a variety of features to help with textual
representation of voice. They can be used to create transcripts or caption
streams. There are also fully managed services available to do all stages of
processing starting from taking your video, making a transcript and adding
closed captions and returning the file to you. Transcripts can be created manually
such as with ExpressScribe or WriteNaturally or voice recognition software such
as from the consumer oriented digital companies. These can then be merged as
captions using tools such as Camtasia. Youtube also had a feature to add closed
caption to videos. However, the data from captions or transcripts can be just
as usefully mined as metadata of the video or audio files. A background speech to text conversion
service can automatically add such desirable content to existing collections
and archives – some of which in the form of speech, songs, narrations and
commentaries are considered golden and classics till date. Moreover, it is this ability to generate this
data automatically for all new content as they become available going forward
which makes it appealing to be combined with searches to produce relevant
results.
Conclusion: While
accuracy of speech recognition software is widely understood to be error prone,
an offline service to automate derivation of transcripts and captions from existing
and new content can prove to be valuable. Scenarios such as listening to a song
by searching for a part of the lyrics whether or not it is available in video
format is appealing to enhance the results from any web search. Similarly
governments may find it useful to translate English conversations over mobile
devices for text analysis and data mining. Also, consumer electronics such as
Alexa or Siri that interact with users may keep track of these conversations
with humans for analysis related to personalization based marketing. These therefore
seem like the tip of an iceberg of data.
#codingexercise
Given a geometric progression series starting with 1 as the first term and values of r, S and p where :
r = common ration of the progression
S = sum of first N terms modulo p
p = a prime number
Find N
static int GetN(int r, int S, int p)
{
int res = -1;
long val = 0;
for (int k = 0; k < p; k++)
{
val += ((long)Math.Pow(r, k)) % p;
if (val % p == S)
{
res = k + 1;
break;
}
}
return res;
}
Given a geometric progression series starting with 1 as the first term and values of r, S and p where :
r = common ration of the progression
S = sum of first N terms modulo p
p = a prime number
Find N
static int GetN(int r, int S, int p)
{
int res = -1;
long val = 0;
for (int k = 0; k < p; k++)
{
val += ((long)Math.Pow(r, k)) % p;
if (val % p == S)
{
res = k + 1;
break;
}
}
return res;
}
No comments:
Post a Comment