Sunday, February 7, 2021

Social graph

 This essay is about data mining on social engineering data. Applications such as Facebook, Whatsapp, Twitter and Instagram have centered on personal connections around an individual and these social graphs are rich in information that can be mined with well-known data mining algorithms for a variety of purposes such as recommendations for the user, commercials and marketing. 

The data mining algorithms were well suited for relational databases and tabular format. Graphs have their own databases where relationships are described by edges between nodes. The nature of the data does not change. Its representation and querying language changes but it has been possible to standardize the query language over diverse set of data stores such as relational stores, big data, snowflake schema and graph databases. With this assumption, we proceed to list the use case scenarios for data mining over social engineering data.

1. Classification algorithms: Forming groups of individuals has always been organic on social engineering platforms united by some common purpose or campaign. This application of classification algorithm forms a statistics table of the different interactions that these individuals have had over time and collects them in a vector representation for each individual based on some chosen metrics as features Then the groups are learned and it provides additional information into what the individual may have been too pigeon-holed to see but the software can make a classification.

2. Regression algorithms – Almost any demographic data pertaining to individuals on a social graph is likely to form a scatter plot. One of the best advantages of linear regression is the prediction about time as an independent variable. When the data point has many factors contributing to their occurrence, a linear regression gives an immediate ability to predict where the next occurrence may happen

3. Segmentation algorithms – A segmentation algorithm divides data into groups or clusters that have similar properties. Population segmentation on social graph provides interesting insights into how individuals might react to campaigns.

4. Sequence Analysis Algorithm – The difference between classification algorithms and sequence algorithms is the latter focuses on paths in sequences. It does not even need to know the meaning of the constituents of the sequence. It just has to encode the sequence to a context and use its corresponding decoder to generate an output sequence. Chatbots use this to create responses to individuals chat. The relays between individuals can be similarly studied.

5. Outliers mining algorithm – Since everyone is not a conformist, there are bound to be fringe groups and outliers whose identification alone is of valuable interest to various agencies. This calls for the use of Outlier algorithms to determine their identities.

6. Decision tree – Perhaps the most used data mining model is the decision tree simply because it is easy to visualize and study as it forms branches based on decision splits of the user community. Well-trained models can be easily to predict the label associated with newcomers

7. Time-series algorithm – perhaps the most anticipated information from Social graph is how things change over time. Using the historical data to predict the outcome of a variable falls within this category of analysis 


No comments:

Post a Comment