Thursday, October 18, 2018

We were discussing full-text search on object storage. When users want to search, security goes out the window. The Lucene index documents are not secured via Access Control Lists They don’t even record any information regarding the user in the fields of the document. The user is really looking to cover the entire haystack and not get bogged down by disparate collections and the need to repeat the query on different indexes.
Even the documents in the object storage are not secured so as to hide them from the indexer. Although S3 supports adding access control descriptions to the objects,  those are for securing the objects from other users and not the system. Similarly the buckets and namespaces can be secured from unwanted access and this is part of the hierarchy in object storage. However, the indexer cannot choose to ignore want part of the hierarchy because it would put the burden on the user  to decide what to index. This is possible technically by blacklisting any of the hierarchy artifacts but it is not a good business policy unless the customer needs advanced controls
This has an unintended consequence that users with sensitive information in their objects may divulge them to search users because those documents will be indexed and match the search query. This has been noticed in many document libraries outside object storage. There the solution did not involve blacklisting. Instead it involved the users to be informed that the library is not the place to save sensitive information. We are merely following the same practice here.
Finally, we mention that the users are not required to use different search queries by using different document collections. All queries can target the same index at different locations that is generated for all document collection.

No comments:

Post a Comment