Monday, July 10, 2017

Content Databases

In the writeup, we describe the storage requirements of the text summarization service. We said that this is equivalent to using a cloud based NoSQL document store because our summaries are not files but JSON documents, which we generate and maintain for all users of our service and we intend to use it for analysis. And we referred to the original documents from which the summaries were created to be made available via document libraries such as Sharepoint or OneDrive or Google Drive.  When users upload a document to the summarization service for its processing, it could be stored the same way as we do with say Sharepoint that is backed by Microsoft SQL Server. Sharepoint uses HTTP routing mechanism and integrated windows authentication.  Sharepoint services maintains multiple databases – system databases which include configuration, administration and content related data, search service database, user-profile databases and many other miscellaneous data stores. The Sharepoint system databases include configuration which contains data about all Sharepoint databases, web services, sites, applications, solutions, packages and templates, application and farm settings specific to Sharepoint Server, default quota and blocked file types.  Content databases are separate from configuration. One specific content database is earmarked for central administration web site. The content databases otherwise store all the site content including documents, libraries, web part properties, audit logs, applications, user names, rights and project server data. Usually the size of content databases is kept under 200GB but size upto 1TB is also feasible. The Search service databases include search service application configuration and access control list for the crawl.  The crawl database stores the state of the crawled data and the crawl history. The Link database stores the information that is extracted by the content processing component and the click through information. The crawl databases are typically scaled out for every twenty million items crawled. The Link databases stores the information that is extracted with the help of content processing and click through. It might be relevant to note that the crawl database is read heavy where as the link database is write heavy. The user profile service databases can scale up and out because they store and manage users and their social information. These databases also include social tagging information which is the notes created by the users along with their respective URLs. The size is determined by the number of ratings created and used. The synchronization database is also a user profile database and used when profile data is being synchronized with directory services such as Active Directory. This size is determined by the number of users and groups. Miscellaneous services include those that store app licenses and permissions, Sharepoint and access apps, external content-types and related objects, managed metadata and syndicated content-types, temporary objects and persisted user comments and settings, account names and passwords, pending and completed translations, data refresh schedules, state information from InfoPath forms, Web parts and charts, features and settings information for hosted customers, usage and health data collection and document conversions and updates. The tasks and their databases associated with content management indicate a planning required for the summarization service. It might therefore help if the content-management service can be used as a layer below the summarization service so the storage is unified. At the cloud scale, we plan for such stores in the cloud databases or use the Big Table and file storage based solution.
Courtesy: msdn
#codingexercise
Check if the nth bit from last is set in the binary representation of a given number
bool IsSet(int number, int pos)
{
var result = Convert.ToString(number, 2);
if (pos > result.Length)
    return false;
else
   return result[result.Length-pos] == '1';
}

No comments:

Post a Comment