Monday, May 13, 2024

 This is a summary of the book titled “Practical Fairness: achieving fair and secure data models”  written by Aileen Nielsen and published by O’Reilly in 2020. The author is a software engineer and attorney who examines various kinds of fairness and how both training data and algorithms can promote them. Machine Learning developers and MLOps can benefit from this discussion and as an O’Reilly book comes with python examples. Fairness in this book is about who gets what and how that is decided. Fair results start with fair data and there must be an all-round effort to increase fairness at various stages of the process. Privacy and fairness are vulnerable to attacks. Product design should also be fair and make a place for fair models. Industry standards and regulations can demand fairness from the market in all the relevant products.

Fairness in technology is crucial for ensuring that users receive fair treatment, and that technology is used responsibly. It is essential for software developers to differentiate between equity and equality, security, and privacy, and to avoid legal issues and consumer backlash. People tend to prefer equity over equality, as it implies that people should not receive different treatment for belonging to a certain group. However, equity is not straightforward, as privacy metrics can be undercut by human error.


To ensure fairness, machine learning models should start with fair data, which should be high-quality, suited to the model's intended purposes, and correctly labeled. Technology is neither good nor bad, and data quality can suffer from biased sampling and incomplete data. A fairness mandate can stimulate ideas in mathematics, computer science, and law, but it cannot guarantee fairness in all respects.

Data models can be trained to increase fairness throughout their development process. Pre-processing is the most flexible and powerful option, offering the most opportunities for downstream metrics. Techniques to increase fairness include deleting parts of data that could be exploited to discriminate against people, such as gender, or attaching weightings to different data about a person. However, individual fairness can lead to unfairness for a group, so techniques like learned fair representation and optimized pre-processing balance the two. Adversarial de-biasing involves having a second model analyze the output of the first, ensuring non-discriminatory outcomes.


Sometimes, neither pre-processing data nor training a model for fairness is possible or allowable. Users can process the output of a model to make it fairer, providing transparency. To gauge whether a model generates fair outcomes, audit it using black-box auditing or white-box auditing. Interpretable models or black-box models that explain the basis of decisions can help avoid arbitrary decisions. Privacy and fairness are vulnerable to attacks, as modern technologies may undercut anonymization and new concepts emerge, such as the "right to be forgotten."

Privacy is an evolving legal norm, and machine learning models are vulnerable to attacks that aim to subvert their output. Attacks can be evasion attacks, where attackers feed model data that forces it to err, or poisoning attacks, where the attackers make the model malfunction or classify certain data in a desired way. Fair models should be integrated into fair products, satisfying customer expectations, and ensuring that companies do not harm those who contributed data. Companies should also consider how their products could be misused and not roll out updates too frequently. Even if a product works well, it can have fairness problems if it works better for some than for others. The market will not force companies to deliver fairness in their products without the correct laws. The EU's General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA) are two major laws concerning data use, providing citizens with the right to data portability, erasure, and correction of personal data.

The GDPR and CCPA have prompted organizations to consider privacy, transparency, and accountability. The GDPR prohibits algorithms from making significant decisions affecting EU citizens. In the US, laws regulating algorithm use have not passed. Some states, like California, set rules for chatbots, ensuring users are not communicating with humans. As machine learning advances, technology and fairness laws will evolve.


No comments:

Post a Comment