We were discussing the use of Object Storage to stash state for each worker from applications services and clusters and its implementation in the form of distributed services over object storage. The nodes in a storage pool assigned to the VDC may have a fully qualified name and public IP address. Although these names and ip address are not shared with anyone, they serve to represent the physical location of the fragments of an object. Generally, an object is written across three such nodes. The storage engine gets a request to write an object. It writes the object to one chunk but the chunk may be physically located on three separate nodes. The writes to these three nodes may even happen in parallel. The object location index of one chunk and the disk locations corresponding to the chunk are also artifacts that need to be written. For this purpose, also, three separate nodes may be chosen and the location information may be written. The storage engine records the disk locations of the chunk in a chunk location index and the disk locations corresponding to the chunk to three different disks/nodes. The index locations are chosen independently from the object chunk locations. Therefore, we already have a mechanism to store locations. When these locations have representations for the node and the site, a copy of an object served over the web has a physical internal location. Even when they are geo-replicated, the object and the location information will be updated together. The tracking of site-specific locations for an object is a matter of merely maintaining a registry of locations just the same way as we look up the chunks for an object. We just need more information on the location part of the object and the replication group automatically takes care of keeping locations and objects updated as they are copied.
No comments:
Post a Comment