Cluster computing

Saturday, December 28, 2013

We now look at unstructured data and warehouse. Combining structured data and unstructured data is like integrating two different worlds. CRM is an example. In CRM, we have structured data corresponding to the customer demographics and unstructured data corresponding to the documents and communications from the customer.
Text is common to both worlds and with this, we establish match the communications with the customer. However, text is susceptible to misspelling, mismatched context, same names, nicknames, incomplete names and word stems.
Using stop words and stemming, we can do a probabilistic where all the data that intersects is used to make a match. The strength of the match can be depicted on a numeric scale.
A themed match is another way to make a match where the unstructured data is organized based on themes. such as sales, marketing, human resources, engineering, accounting, distribution. Once the collection of industrially recognized themes have been gathered, the unstructured data is passed against all the themes for a tally.
Linkages between themes and theme words can be done via a raw match of the data.
Another way to link two documents is based on the metadata in the structured environment.
A two tiered data warehouse is typically used for the usage of unstructured data in the data warehouse.
one tier is the structured data and another tier is the unstructured data.
Unstructured data can be further subdivided for example comunications and documents.
The relationship between communications and unstructured data is formed on the basis of identifiers.
Unstructured data visualization is very different from the that for structured data which is more business intelligence. In unstructured data, we create self organizing map where clusters are depicted.
Volumes of data is an issue with every warehouse whether it has unstructured or structured data.
The two environments are fit together based on identifiers. In the structured environment we keep both metadata and data pertaining to the unstructured environment. the metadata has information about the repository. The records have information about the data, identifier and close identifier. close identifiers are those where there is a good indication that a match has been made.

Cluster computing

Saturday, December 28, 2013

No comments:

Post a Comment