Saturday, August 30, 2014

I came across an interesting topic about how to store key values in relational tables and whether we should move to NoSQL just for the sake of storing key values. The trouble with storing key values in relational tables is that the same key has multiple values. If we keep each record for a key value pair soon we have a flooding of the table but more importantly this just seems like  a collection and not an entity which is fundamental to the database model. We could alleviate the storage concerns and call them different entities by say calling one table as keys and another table as values and adding a relation between them.
That said, probably an easy to implement way is to store it as an XML in a single column. This alleviates the problem of having two columns where one is the fieldvalue and the other is the fieldvaluetype. Moreover, the column can be indexed and queried with XPath. This is why it is preferred over JSON.
Another approach is to use the EntityAttributeValue model also called the EAV model. Here the attributes are available as columns and they can be numerous with only a few columns holding values at some time. This is also called sparse matrix.
The thing to note here is that the tendency to add custom properties is not restricted to a single table and can become an epidemic in the system. That is why the data model may need to be redesigned or at least extended if such things crop up. The normalization is important just as much as the convenience of using key-values.
Key Value are generally stored using a hashing function because they are essentially collections. The hashing allows to bucket the keys and collisions are resolved by overflow lookups.
The NoSQL stores such as MongoDB serve more purposes as well. They are better suited for the following use cases:
column stores
key value stores
document stores
graph stores
If we look at the

use EventsDB

go
 

CREATE TABLE dbo.Event

( ID int identity not null,

Timestamp datetime not null

CONSTRAINT IX_Event_Timestamp

PRIMARY KEY CLUSTERED (Timestamp, ID)

WITH (IGNORE_DUP_KEY = OFF),

Host nvarchar(4000) not null,

Source nvarchar(100) not null,

SourceType nvarchar(100) not null,

FieldMap xml null,

);

INSERT INTO dbo.Event VALUES (GETDATE(), HOST_NAME(), N'Source1', N'SourceType1', NULL);

INSERT INTO dbo.Event VALUES (DATEADD(DD, 1, GETDATE()), HOST_NAME(), N'Source2', N'SourceType2', NULL);

UPDATE dbo.Event set FieldMap = '<xml><Source>Source1</Source><SourceType>SourceType1</SourceType></xml>'

WHERE SOURCE=N'Source1'

 
 
UPDATE dbo.Event set FieldMap = '<xml><Source>Source2</Source><SourceType>SourceType2</SourceType></xml>'

WHERE SOURCE=N'Source2'

SELECT * FROM dbo.Event

go

ID Timestamp Host Source SourceType FieldMap
1 2014-08-31 09:22:28.193 ADMIN-PC Source1 SourceType1 <xml><Source>Source1</Source><SourceType>SourceType1</SourceType></xml>
2 2014-09-01 09:22:28.193 ADMIN-PC Source2 SourceType2 <xml><Source>Source2</Source><SourceType>SourceType2</SourceType></xml>


SELECT ID, FieldMap.query('data(/xml/Source)') as Value

FROM dbo.Event

WHERE FieldMap.exist('/xml/SourceType') = 1

No comments:

Post a Comment