Monday, July 8, 2013

Full text and semantic extraction in SQL Server 2012

Here are some sample queries for semantic extraction of keyphrases in SQL Server 2012.
SET @Title = 'TestDoc.docx'

SELECT @DocID = DocumentID
FROM Documents
WHERE DocumentTitle = @Title

# Finds the keyphrases in a document.
SELECT @Title as Title, keyphrase, score
FROM SEMANTICKEYPHRASETABLE(Documents, *, @DocID)
ORDER by score DESC

# Finds similar documents
SELECT @Title as SourceTitle, DocumentTitle as MatchedTitle,
DocumentID, score
FROM SEMANTICSIMILARITYTABLE(Documents, *, @DocID)
INNER JOIN Documents ON DocumentID = matched_document_key
ORDER BY score DESC

# Finds keyphrases that make documents similar or related
SELECT @SourceTitle as SourceTitle, @MatchedTitle as MatchedTitle, keyphrase, score
FROM SEMANTICSIMILARITYDETAILSTABLE(Documents, DocumentContent, @SourceDocID, DocumentContent, @MatchedDocID)
ORDER BY score DESC

You can use FileTables to store documents in SQL Server. These are special tables built on top of FILESTREAM.
A FileTable enables application to access files and documents as if they were stored in the filesystem without
requiring any changes to the application.

You can enable semantic search on columns using semantic index.
To create a semantic index when there is no fulltext index

CREATE FULLTEXT CATALOG ft as DEFAULT
GO

CREATE UNIQUE INDEX ui_ukDescription
ON MyTable.Description(DescriptionID)
GO

CREATE FULLTEXT INDEX ON MyTable.Description
(Description, Language-1033, Statistical_Semantics)
KEY INDEX DescriptionID
WITH STOPLIST = SYSTEM
GO


or Add semantic indexing to one that has fulltext index
ALTER FULLTEXT INDEX ON MyTable.Description
    ALTER COLUMN Description
        ADD Statistical_Semantics
    WITH NO POPULATION
GO

No comments:

Post a Comment