This may not be relevant to the previous blog post. But this is one of the things I came across today and blogging about it here. The log manager in a DBMS shows the changes committed to the database, helps with rollback of aborted changes and recovery from system failure. The standard way to do this is called the Write Ahead Logging (WAL). The WAL protocol states three rules:
1) Each change to the database should generate a log record.
2) Database log records should be flushed in order.
3) Log record must be flushed before the commit request for the change (or transaction) completes.
These rules of Write Ahead Logging are not restricted to a database. They are used by file systems as well. Logging faces challenges when performance is desired. To enable so called fast path to boost performance, database systems uses a "direct, steal / not-force" mode. This rule states that
1) the data objects are updated in place
2) unpinned buffer pool frames can be "stolen"
3) buffer pool pages need not be forced or flushed
The log manager then has to handle both the undoing the flushes of stolen pages from aborted transactions and redoing the changes to not-forced pages of committed transactions.
Another challenge is to keep the log records small. The log manger could log the logical operations for optimization instead of the physical operations but redo and undo are onerous. In practice, physical operations are logged to support redo while logical operations are used to support undo. This is how the crash state is detected during recovery and the transaction from that point are rolled back. Log records are written with incrementing log sequence numbers.
To optimize crash recovery, not all of the history is replayed. The start point for the recovery is chosen as the oldest of these two log records 1) one describing the change to the oldest dirty page in the buffer pool and 2) the log record describing the start of the oldest transaction in the system. This start point is called the recovery LSN and is computed at what is called checkpoints. Since this computation could take some time, frequent checkpoints are suggested. Another way to make this efficient is to write out the buffer pool pages asynchronously instead of synchronously. Logging and recovery work not only on data pages but a variety of other information related to internal data structures.
1) Each change to the database should generate a log record.
2) Database log records should be flushed in order.
3) Log record must be flushed before the commit request for the change (or transaction) completes.
These rules of Write Ahead Logging are not restricted to a database. They are used by file systems as well. Logging faces challenges when performance is desired. To enable so called fast path to boost performance, database systems uses a "direct, steal / not-force" mode. This rule states that
1) the data objects are updated in place
2) unpinned buffer pool frames can be "stolen"
3) buffer pool pages need not be forced or flushed
The log manager then has to handle both the undoing the flushes of stolen pages from aborted transactions and redoing the changes to not-forced pages of committed transactions.
Another challenge is to keep the log records small. The log manger could log the logical operations for optimization instead of the physical operations but redo and undo are onerous. In practice, physical operations are logged to support redo while logical operations are used to support undo. This is how the crash state is detected during recovery and the transaction from that point are rolled back. Log records are written with incrementing log sequence numbers.
To optimize crash recovery, not all of the history is replayed. The start point for the recovery is chosen as the oldest of these two log records 1) one describing the change to the oldest dirty page in the buffer pool and 2) the log record describing the start of the oldest transaction in the system. This start point is called the recovery LSN and is computed at what is called checkpoints. Since this computation could take some time, frequent checkpoints are suggested. Another way to make this efficient is to write out the buffer pool pages asynchronously instead of synchronously. Logging and recovery work not only on data pages but a variety of other information related to internal data structures.
No comments:
Post a Comment