Data quality is already an existing issue in the world of enterprise data management. A Gartner report in 2013 surveyed a wide range of companies and found that data quality issues cost them on an average $14 million a year. These bad records are not just human error but are also caused due to defects in software and/or malicious activity that corrupt the database. Cleaning existing databases is expensive and messy at the most but not impossible. With blockchain, comes added safety and security against changes in historical transactions through its immutability. That same immutability backfires when you have incorrect data present in the blockchain. Removing bad data becomes an insurmountable hill if proper mechanisms are not implemented from day one.
Example Case Issue
Take for as an example a Hyperledger Fabric based consortium of steel suppliers and consumers collaborate for the supply chain management. The steel consumers and suppliers want to keep the terms of deals private between customer and supplier. At the same time, it is more cost effective for the steel market to manage on a common blockchain for auditability and trust. A given consumer and their supplier can create channels and have the data shared across the designated parties, adding in logistics and transportation suppliers. This combines the advantages of blockchain with the security of a private channel.
As the Fabric is currently only Crash Fault Tolerant (CFT), issues will arise when any single node malfunctions and it starts to add erroneous and unwanted data on the blockchain. In this scenario, the data cannot be removed and has the possibility of corrupting the chain. The immutability of the chain plays a double edge sword that is not conducive to the business and in line by traditional IT data management practices.
This issue is not limited to consensus mechanisms that are only CFT. Even with the added security provided by the advanced consensus of Byzantine fault tolerance (BFT) these issues can arise. Whereas Bitcoin and Ethereum Classic rely on a preponderance of mining nodes to manage consensus and thus transaction integrity, private blockchains run with a significantly smaller network of nodes to be cost-feasible. 67% of the nodes in a blockchain network will need to maintain consensus for a valid chain.
On our example supply chain blockchain, if a compromised node successfully commits illegitimate transactions the entire chain of record is called into question. Here again, blockchain benefits end up become unusable and even damaging. It not only invalidates the blockchain but requires a full rollback to before the unwanted events occurred
The bottom line is that there must be a tool to deal with data issues on the blockchain. The dilemma is that making it easier to modify the blockchain due to malfunction of the node also makes it easier for malicious attackers to modify existing legitimate transactions. This will defeat the core value of using a blockchain for immutability and improved trust. At the same time, an immutable record of bad data does not fulfill the core value of blockchain and adds a barrier for Enterprise adoption.
Examples of remediation measures include:
- To balance the need for immutability and integrity with the need to resolve data issues, a governance mechanism using currently available capabilities of blockchain should be established for managing such issues. For example, proposing an “correction” to modify blockchain data can include minimum mandatory review period of 30 days using a proposal system can be leveraged to correct data issues and preserve integrity. This gives the participating members of a blockchain the ability to vote on proposed corrections while also minimizing the opportunity to leverage the correction mechanism to be used for malicious attacks.
- Another option is to allow modification of records based on consensus. To enable this, node operators will be required to keep a redundant “clean” set of records as backups. Removing a record will be a difficult choice as the chain would require to be rehashed. Even with being rehashed, there are additional complications. The new “clean” blockchain would have replace the existing one and lead to downtime. This downtime could be minimized by adding new transactions to a “temp” channel with the same configurations and add the records to it. While the old chain is replaced, the new transactions can be migrated after restoration.
- Another consideration is the nature of the malicious activity. Sizable hacks of history usually involve a perpetrator who has been hiding in the systems for days if not months. For higher valued data, more sophisticated attackers work over longer periods to help avoid detection. So, a 30-day review period for a proposal to reverse a transaction may not be long enough. One way of fixing it can involve a range on transaction value reversal. Lower valued transactions can have a review period of 30 days and substantially higher ones can have a minimum of 20 days.
In case of a public company, which needs to disclose financial revenue to shareholders and have an audit conducted, 20 days might imply far of a stretch. For such scenarios, 20 days can be reduced to 10 days or lower if the CFO and CEO of all participating companies sign off. This feature of escalation can be used to reverse unapproved transactions for private companies too.
The governance can be conducted for such edits on the chain through a smart contract that helps ensure high availability of the chain. Using distributed private key across multiple managers will reduce the possibility of fraudulent proposal to amend the chain.
As prevention is better than a cure, robust mechanisms can be implemented to reduce fraudulent transactions. Blockchain provides speedy settlement of transactions but not all transactions are required to be settled in a matter of minutes if not seconds. For transactions not requiring immediate settlement can include additional authentication, usage of side chains for transaction validate and slower settlements to allow for additional review.
Examples of preventative measures include:
- The Committee of Sponsoring Organizations (COSO) has framework meant to deter internal fraud and external hacks which can cause great loss for any organization. Under its integrated framework for internal controls, risk assessment section suggests using segregation of duties. This can be achieved in a private chain using a weighted key. It can be implemented in a more complex hierarchical structure for higher denominations.
- Each transaction can be tied to a document trail which includes Payment Voucher, Purchase Order, Receiving Report and Invoice. This is the existing recommended means of reconciling transactions in case of disputes as it adds an additional source for reconciliation, which helps discourage their effort.
- As channels have a limited number of participants in a private blockchain ecosystem, they become more susceptible to hack and become a soft target. To prevent issues with invalid transactions, a method like Bitcoin’s Lightning Network can be implemented. This method creates an initial transaction on the main chain and ongoing transactions are done privately. For private chains a hash of the channel state can be regularly added to the main network, this adds a checkpoint for channels while providing the safety of channels.
- Another way of managing blocks would be to create a minimum of two chains, one that performs the transactions and others which manages the smart contract execution. A third and optional one can be used to manage identity. As most transactions do not require instant settlement, they can be put under pending transactions like a market sell order which can be placed by anyone and needs to have a buyer to be accepted. This reduces the possibility of bad blocks. Time under review for massive transactions can be altered accordingly.
- The State of Data Quality: Current Practices and Evolving Trends by Ted Friedman | Saul Judah