Saturday, July 10, 2021

 The NuGet package resolution:

Introduction: This article introduces us to NuGet which is an essential tool for modern development platforms. It is a mechanism through which developers create share and consume useful code since the code is bundled along with its DLL and package information. A NuGet package is a single zip file with a .nupkg extension. It contains code as well as the package manifest. Organizations support sharing and publishing NuGet packages to a global, central, and public repository by the name nuget.org. The public repository can be complemented with a private repository. The package can be uploaded to either the public or private hosts when the package is downloaded the corresponding source is specified. The flow of packages between creators, hosts, and consumers is discussed next.  The central repository has over 100,000 unique packages and they are frequently downloaded by developers every day. With a such large collection of packages, package browsing, lookups, identification, versioning, and compatibility must be clearly called out. This must be done in both the public and the private repository. When the package is downloaded, it is downloaded to the local cache on the developer’s system. When the application is built, these dependencies can be copied over to the target folder where the compiled code is dropped. This allows for the assembly to be found locally and loaded locally thus creating an isolation mechanism between applications referencing those packages on the same host. Enabling this isolation of application dependencies or assemblies is one way in which applications can safeguard that they will work on any host. One of the frequently encountered routines with package consumption is that the application must always use the same compatible versions of the packages.  When the dependencies are updated to the next versions, sometimes, they don't work well together. In order for the application to continue working, it must maintain compatibility between the application dependencies which can be set once and maintained as and when the packages are updated.  The initial assembly compatibility is ironed out at the build time in the form of compilation failures and resolutions. Subsequent package updates must always target incremental and higher versions than when they were initially built. If the version increment causes a compatibility break, the application has the choice to remain at the current version of the assembly and wait for subsequent versions that fix. The dependencies must be declared clearly with the project file used to contain and build the source. The version compatibility can be made more deterministic with the help of versions associated with those packages as well as their substitution policies. Packages might get updated for many reasons both from the publisher for the sake of defects found and fixed by the publisher as well as by revisions recommended from the common vulnerabilities database. Code and binary analysis tools help with these recommendations for the packages to be updated. Different versions of the package can coexist side-by-side for the same host as long as they are located in different folders. Package versioning and automatic redirect of versions are some techniques that help with this versioning when some or all of the packages get updated for an application. The manifests of package dependencies with their versions as well as the binding redirects in the application configuration file enable this compatibility to be maintained on revisions such that the application will always compile and run on any host. Certain assemblies are part and parcel of the runtime that is required to execute the application. These system assemblies are also resolved in the same way as application dependency when they are specified by targets except that their location is different from the local NuGet package cache. The system assemblies are bundled in different target frameworks such as the .Net core and .Net framework. The target framework with a moniker. such as 'netcoreapp3.1', provides an exhaustive collection of system assemblies that make it universal for applications to run and provide all the features it might expect. The target framework with a moniker, such as 'net48', is more lightweight and geared towards the portability of applications and the latest features of the runtime. With fewer assemblies, they have a smaller footprint.  When the dependencies are not found, the package sources must be provided to download these dependencies. Once the packages are downloaded, they will be resolved and loaded.  As discussed earlier external and internal package sources might both be required within the development environment to resolve different assemblies. Certain organizations restrict the package sources to just one otherwise it might violate the point of origin of the package and spurious packages might be introduced into the build and application binaries. They provide a workaround to target different package sources via proxying but they rely on a single authoritative source to control the overall package sourcing. The transparency in the package resolution from its source is a security consideration rather than a functionality consideration and is a matter of policy. When the assemblies are downloaded, a different set of security considerations come into play.  The package cache on the local host is subject to safeguarding just like any other data asset. A developer can choose to clean the cache and start downloading all over again to improve package health and hygiene during rebuilds. This technique is called restore and it is available as an option during builds. Since the destination of the packages on the localhost where the code is built and run, is always the file system, there might be issues locating them even after the dependencies are downloaded. The resolution of the path where the assemblies are found and loaded from must also be made deterministic. There are two ways to go about this – one technique is the mention of the specific path for the assembly and the second option is to register it to the global assembly cache which is unique to every host and must maintain system assemblies. Registering the assembly to the global assembly cache provides a way to resolve them independent of the file system but it interferes with the application isolation policy. Finally, assemblies loaded into the runtime might still be incorrect and some troubleshooting might need to be done. This is not always easy but there are techniques to help with this case. The assembly loading process can be made more transparent by requiring the app domain to log to the console on how it finds and loads this assembly. Standard events can be bound and the state of the app domain can be viewed. If the name and version of the assembly are not enough to know about it, the assembly can also be enquired about its types by a technique known as reflection. The resolving of the assembly location and the order of the loading of assemblies can be made transparent with help of the assembly log viewer. The log viewer writes to a well-known location which can also be customized. The level of logging can be set with the help of registry settings on the host but care must be taken to not turn on logging for an extended period of time. Since the loading of assemblies is quite a common occurrence the logging must be turned on or off only for the duration of investigation otherwise the logs will typically tend to grow very large very quickly. Together with all these techniques, a developer can not only consume packages from well-known source but also publish packages for others to use in a more deterministic manner.


No comments:

Post a Comment