Today we continue our discussion on the detector distribution algorithm. If there are N intelligent disks, then they are clustered into k categories as follows:
Step 1: Initialize the N disks into individual clusters of their own with the N centers pointing to the disks themselves. K = N
Step 2: stop the execution if K clusters have been formed.
Step 3: merge shortest distance clusters into a new cluster and adjust the center.
Step 4: use this new cluster and decrement K to repeat with step 2.
The shortest distance between clusters is computed as the fraction of the number of detectors stored in the top layer over K. These detectors are randomly selected separate from the K -categories and stored in the top access control module. The rest is stored in the lower access control module.
Let us take a closer look at the Access control function distribution.This is not computed by just the metadata server but also by the intelligent disks thereby avoiding a single point of failure and performance bottleneck. However the number type detectors are generated only by the metadata server.This helps with the accuracy of the access request inspection. The detectors are distributed among top and lower access control modules. The two layers can inspect access request in parallel. The lower layer stores a smaller number of detectors so that it doesn't get in the way of IO. During detector generation, all sub-string in the legal access request are extracted one time and converted to a single integer value. The numerical values are then indexed with a B-Tree. This avoids the conversion of the same substring from different sources (access requests) and improves the algorithm in detector selection. The numerical interval found this way avoids the mistake in judging a legal access request as an abnormal one. Anything that lies outside this numerical range is not a detector or a valid access request.
The move from comparing binary strings to numerical values reduces the inspection overhead considerably.Further with clustering, these detectors are distributed in such a way that it improves the accuracy of the inspection without the overhead.
Step 1: Initialize the N disks into individual clusters of their own with the N centers pointing to the disks themselves. K = N
Step 2: stop the execution if K clusters have been formed.
Step 3: merge shortest distance clusters into a new cluster and adjust the center.
Step 4: use this new cluster and decrement K to repeat with step 2.
The shortest distance between clusters is computed as the fraction of the number of detectors stored in the top layer over K. These detectors are randomly selected separate from the K -categories and stored in the top access control module. The rest is stored in the lower access control module.
Let us take a closer look at the Access control function distribution.This is not computed by just the metadata server but also by the intelligent disks thereby avoiding a single point of failure and performance bottleneck. However the number type detectors are generated only by the metadata server.This helps with the accuracy of the access request inspection. The detectors are distributed among top and lower access control modules. The two layers can inspect access request in parallel. The lower layer stores a smaller number of detectors so that it doesn't get in the way of IO. During detector generation, all sub-string in the legal access request are extracted one time and converted to a single integer value. The numerical values are then indexed with a B-Tree. This avoids the conversion of the same substring from different sources (access requests) and improves the algorithm in detector selection. The numerical interval found this way avoids the mistake in judging a legal access request as an abnormal one. Anything that lies outside this numerical range is not a detector or a valid access request.
The move from comparing binary strings to numerical values reduces the inspection overhead considerably.Further with clustering, these detectors are distributed in such a way that it improves the accuracy of the inspection without the overhead.