I started looking at SSMS version 17.5 on prem MSSQL DB promising the capabilities (amongst other things) to align with offerings on the Azure SQL DB portal. My focus is mainly on the data discovery capabilities.
As with many developers working predominantly with data, handling sensitive data is a very much hot topic with the likes of “GDPR” etc. The documentation on this suggest that it auto-discovers and allows labelling for sensitive data. Here goes my test drive on this….
- Open SSMS as usual, on object explorer connect to one of the SQL instance.
Right Click a database, All tasks and choose to Classify Data..:
When running this first time, the report will return with recommendations: 2. Looking closely to the classifications:
- Sensitivity Label:
8 categorisations here indicating data sensitivity levels from public through to highly confidential.
Interestingly data possibly labelled as confidential can further be indicated if they are GDPR related data. This can significantly reduce time and effort in IT when highlighting data affected by GDPR and making efforts around identifying impact and analysis on these much simpler.
- Information Type:
12 categorisations including ‘other’, and ‘n/a’ attempt to categorise use of the data in the business – finance, contact info etc.
Recommended categorisations can be amended, and your own categorisations can be added and saved.
I have tried adding table and column to see the impact that this have on recommendations:
- Add Table Name [dbo].[Person] with Column Name [Name].
No change in the recommendations.
- Add My name in that table
No change in the recommendations
- Change [Name] to [ForeName]
No change in the recommendations
- Change [ForeName] to [FirstName]
Recommendation now includes this new column.
So.. It looks like column names will need to be rather specific; ie firstname rather than forename.
I took this further, profiling the statements ran when this task is being actioned. I noticed this:
That these recommendations/suggestions are made on the fly based on some assumptions (search patterns on the column name) :
Earlier was an excerpt of the statement before the categorisations were saved for the database. (Point 8). These returned a query result of microsoft’s suggested categorisations based on a lookup pattern for each of the categorisation stored in table variables being looked up by querying system tables.
For a ‘saved’ set of recommendation, running the ‘classify data..’ action runs this query:
SELECT s.name AS schema_name, t.name AS table_name, c.name AS column_name, EP1.value AS information_type_name, EP2.value AS information_type_id, EP3.value AS sensitivity_label_name, EP4.value AS sensitivity_label_id FROM sys.columns c LEFT JOIN sys.tables t ON t.object_id = c.object_id LEFT JOIN sys.schemas s ON s.schema_id = t.schema_id LEFT JOIN sys.extended_properties EP1 ON c.object_id = EP1.major_id AND c.column_id = EP1.minor_id and EP1.name = 'sys_information_type_name' LEFT JOIN sys.extended_properties EP2 ON c.object_id = EP2.major_id AND c.column_id = EP2.minor_id and EP2.name = 'sys_information_type_id' LEFT JOIN sys.extended_properties EP3 ON c.object_id = EP3.major_id AND c.column_id = EP3.minor_id and EP3.name = 'sys_sensitivity_label_name' LEFT JOIN sys.extended_properties EP4 ON c.object_id = EP4.major_id AND c.column_id = EP4.minor_id and EP4.name = 'sys_sensitivity_label_id' WHERE (EP1.value IS NOT NULL OR EP2.value IS NOT NULL OR EP3.value IS NOT NULL OR EP4.value IS NOT NULL)
To surmise, it is an exciting capability that will make our lives (data geeks) a whole lot easier when determining which data we will need to treat sensitively, mainly in the dev/test environments. And as these will be saved as extended properties for these columns we can do reporting directly against these metadata, assess them with help from data governors and then act on them accordingly.