Saturday, February 20, 2021

 The file layout for scripts imported from database 

 

Introduction: The article on exporting scripts from database talked about the business justification but not about the implementation. Databases lay out data in the form of tables, rows and columns. That article referred to the data being written out to files but omitted file layout. This article touches on that topic. 


DescriptionWhen the text columns is consistent across all tables, it tends to have a reliable pattern. This data can be reorganized in a table that is suitable for staging before the data is exported to files. Creating staging tables also helps background jobs because they can leverage the organization to help with the pause and resume for their long running activities.  


Some text columns could be small, others could be large or they might both be there in a table. They can even become large-text. Scripts do not necessarily have a size limit and not all text columns store scripts. There are no patterns to interpret from the contents of a column.  


If the content varies a lot, it can still be saved as independent files just so long as we can selectively filter out the scripts from the relational tables. The order in which they are retrieved can also be determined by the order in which they are listed in the tables. Each table can then correspond to a folder. The folders can be independent just like the tables but folders can also be nested to determine the layout for tables and columns. Identifiers for the files and the folders can have a name or preferably a multi-part identifier that comprises not only of the sequence number but also of the name of the column or the table, if not the content of the cell pointed by the row number and column. 


Some metadata can also be packed along with the data so that the entries can be corroborated upon unpacking. Checksums or digests help in this regard and the listing of entries or their sizes can also be included. 

The data in the file layout must be compressed. This automatically prepares the list of files that are included. This packing is also efficient in terms of storage and portability. 


Conclusion: The scripts from the database for a customer in say a multi-tenant cloud database can be extracted as files and then checked into a source control repository by simply overlaying over existing scripts by virtue of its file layout. This helps with determining the differences, it’s history and versioning which reduces the cost of investigations pertaining to this form of data in the customer’s instance. 

 

No comments:

Post a Comment