Cluster computing: knowledge discovery

Monday, June 10, 2013

knowledge discovery

Knowledge extraction requires the reuse of existing formal knowledge (reusing identifiers or ontologies) or the generation of schema based on source data. It is similar to NLP and ETL but involves representing the results in a format that is called RDF or Resource Description Framework. Resource Description Framework is a modeling of information ( metadata) such as what is implemented for web resources. The RDF data model makes use of subject-predicate-object expressions. RDF is an abstract model and has several serialization formats. This format is machine readable and machine interpretable. A collection of RDF statements intrinsically represents a labeled directed multi-graph. As such an RDF based data model is persisted in relational stores in the triple tuple format and similar. The reverse of mapping relational databases to RDF is also common because this data can be made available to semantic web.
The process of information extraction uses traditional methods from ETL, which transforms the data into structured formats. Knowledge extraction can be categorized based on
1) source such as whether it is text, DB, XML, CSV etc.
2) exposition such as with an ontology file or semantic database.
3) synchronization such as if the extraction is once or multiple times and how it is synced.
4) reuse of vocabularies such as if the tool is able to map table columns to resources
5) automatization such as the degree to which the extraction is assisted or if its manual.
6) requiring domain ontology such as with pre-existing ones
Mapping from RDB tables / views to RDF entities proceeds with the conversion of each column to a predicate, each column value to an object, each row key to a subject and each row to a collection of triples with a common subject.
Similarly, the reverse mapping involves creating a RDFS class for each table, a conversion of all primary keys and foreign keys into IRIs, assigning a predicate IRI to each column, assigning an rdf type predicate for each row, linking it to an RDFS class IRI corresponding to the table.

Cluster computing

Monday, June 10, 2013

knowledge discovery

No comments:

Post a Comment