Thursday, September 13, 2018

We were discussing the suitability of object storage for deep learning.There were several  advantages. The analysis can be run on all data at once and this storage is one of the biggest. The cloud services are elastic and they can pull in as much resource as needed As the backend, the processing is done once for all clients. The performance increases dramatically when the computations are as close to the data as possible. Such compute and data intensive operations are hardly required on the frontend. Moreover, optimization is possible when the compute and storage are elastic where they can be studied, cached, and replayed. Complex queries can already be reduced to use a few primitives  leaving the choice to implement higher order  query operators by users.
use of user defined operators and computations to perform the work associated with the data is well known for querying. Such custom operators enable intensive and involved queries to be written. These have resulted in stored logic such as the stored procedures which are written in a variety of languages. With the advent of machine learning and data mining algorithms, these have enabled support for new languages and packages as well as algorithms that are now available right out of the box and shipped with their respective tools. 
If the query language allowed implicit data extract transform and piping of data, it becomes even more interactive. Previously the temporary data was held in temporary databases or tables or in-memory but there was no way to offload them to the cloud as S3 files or blobs so that the query language becomes even more purposeful as interactive language. Object storage serves this purpose very well and enables a user oriented interactive at-scale data ETL and operations via adhoc queries. Perhaps the interactive IDE or browser for query language may make use of the cloud storage in the future.

No comments:

Post a Comment