Content Databases
In the writeup, we describe the storage requirements of the
text summarization service.
We said that this is equivalent to using a cloud based NoSQL document store
because our summaries are not files but JSON documents, which we generate and
maintain for all users of our service and we intend to use it for analysis. And
we referred to the original documents from which the summaries were created to
be made available via document libraries such as Sharepoint or OneDrive or
Google Drive. When users upload a
document to the summarization service for its processing, it could be stored
the same way as we do with say Sharepoint that is backed by Microsoft SQL
Server. Sharepoint uses HTTP routing mechanism and integrated windows
authentication. Sharepoint services
maintains multiple databases – system databases which include configuration,
administration and content related data, search service database, user-profile
databases and many other miscellaneous data stores. The Sharepoint system
databases include configuration which contains data about all Sharepoint
databases, web services, sites, applications, solutions, packages and
templates, application and farm settings specific to Sharepoint Server, default
quota and blocked file types. Content
databases are separate from configuration. One specific content database is
earmarked for central administration web site. The content databases otherwise
store all the site content including documents, libraries, web part properties,
audit logs, applications, user names, rights and project server data. Usually
the size of content databases is kept under 200GB but size upto 1TB is also
feasible. The Search service databases include search service application
configuration and access control list for the crawl. The crawl database stores the state of the
crawled data and the crawl history. The Link database stores the information
that is extracted by the content processing component and the click through
information. The crawl databases are typically scaled out for every twenty
million items crawled. The Link databases stores the information that is
extracted with the help of content processing and click through. It might be
relevant to note that the crawl database is read heavy where as the link
database is write heavy. The user profile service databases can scale up and
out because they store and manage users and their social information. These
databases also include social tagging information which is the notes created by
the users along with their respective URLs. The size is determined by the
number of ratings created and used. The synchronization database is also a user
profile database and used when profile data is being synchronized with
directory services such as Active Directory. This size is determined by the
number of users and groups. Miscellaneous services include those that store app
licenses and permissions, Sharepoint and access apps, external content-types
and related objects, managed metadata and syndicated content-types, temporary
objects and persisted user comments and settings, account names and passwords,
pending and completed translations, data refresh schedules, state information
from InfoPath forms, Web parts and charts, features and settings information
for hosted customers, usage and health data collection and document conversions
and updates. The tasks and their databases associated with content management
indicate a planning required for the summarization service. It might therefore
help if the content-management service can be used as a layer below the
summarization service so the storage is unified. At the cloud scale, we plan
for such stores in the cloud databases or use the Big Table and file storage based solution.
Courtesy: msdn
#codingexercise
Check if the nth bit from last is set in the binary representation of a given number
bool IsSet(int number, int pos)
{
var result = Convert.ToString(number, 2);
if (pos > result.Length)
return false;
else
return result[result.Length-pos] == '1';
}
Courtesy: msdn
#codingexercise
Check if the nth bit from last is set in the binary representation of a given number
bool IsSet(int number, int pos)
{
var result = Convert.ToString(number, 2);
if (pos > result.Length)
return false;
else
return result[result.Length-pos] == '1';
}
No comments:
Post a Comment