Cluster computing

Sunday, June 25, 2017

Closed Captioning – a software driven approach

Introduction: Web pages, Images, Videos are all created, uploaded and shared everywhere using online sites and services by well known companies. However, voice as a medium of expression, content and publishing has not found such popularity. This writeup briefly describes a software approach to making speech content relevant to web searches via transcription and closed captioning.

Description:

Data that is captured as speech or voice is usually an audio file that is more difficult to search than text for similar reasons as images are difficult to be searched when compared to text. The text representation of content whether audio or video in the form of metadata is a lot more helpful because it conforms to traditional format required for software based search. Native interpretation of image and voice files requires pattern recognition and speech recognition software respectively which are time consuming, costly and prone to failures. This makes it difficult to perform at the same time as the data becomes available. Consequently some out of band processing is involved to add relevant text data to such difficult representations of data. When voice is streamed over the internet as part of video, closed captioning packets can be overlaid over the data to render it at the users’ viewing device. This textual content can also be archived and searched just the same way as we search web pages or documents on a users’ computer. The same holds true for transcription and captioning services. Some captioning tools such as Camtasia, Captionate, Express Scribe, MAGpie2, Overstream allow a variety of features to help with textual representation of voice. They can be used to create transcripts or caption streams. There are also fully managed services available to do all stages of processing starting from taking your video, making a transcript and adding closed captions and returning the file to you. Transcripts can be created manually such as with ExpressScribe or WriteNaturally or voice recognition software such as from the consumer oriented digital companies. These can then be merged as captions using tools such as Camtasia. Youtube also had a feature to add closed caption to videos. However, the data from captions or transcripts can be just as usefully mined as metadata of the video or audio files. A background speech to text conversion service can automatically add such desirable content to existing collections and archives – some of which in the form of speech, songs, narrations and commentaries are considered golden and classics till date. Moreover, it is this ability to generate this data automatically for all new content as they become available going forward which makes it appealing to be combined with searches to produce relevant results.

Conclusion: While accuracy of speech recognition software is widely understood to be error prone, an offline service to automate derivation of transcripts and captions from existing and new content can prove to be valuable. Scenarios such as listening to a song by searching for a part of the lyrics whether or not it is available in video format is appealing to enhance the results from any web search. Similarly governments may find it useful to translate English conversations over mobile devices for text analysis and data mining. Also, consumer electronics such as Alexa or Siri that interact with users may keep track of these conversations with humans for analysis related to personalization based marketing. These therefore seem like the tip of an iceberg of data.

#codingexercise
Given a geometric progression series starting with 1 as the first term and values of r, S and p where :
r = common ration of the progression
S = sum of first N terms modulo p
p = a prime number
Find N
static int GetN(int r, int S, int p)
{
int res = -1;
long val = 0;
for (int k = 0; k < p; k++)
{
val += ((long)Math.Pow(r, k)) % p;
if (val % p == S)
{
res = k + 1;
break;
}
}
return res;
}

Cluster computing

Sunday, June 25, 2017

No comments:

Post a Comment