Following on from yesterdays blog I was wondering about the comparison of objects that were the same and how the IgnoreWhitespace, IgnoreComments, IgnoreKeywordCasing and IgnoreSemiColonsBetweenStatements flags affected the comparison. To be fair I was only interested in IgnoreWhitespace but actually it turns out that those four are very closely related.When the deploy happens, where a script in the source and target are compared the process is:
- 1. Loads of things we will skip
- 2. Any cmd variables in the scripts are replaced with their appropriate values
- 3. If both the scripts are null – the comparison returns true. This has to be the best for performance but the worse for functionality 😉
- 4. If one script is null but not the other then the comparison returns false. This actually has to be the best for comparison performance but worse for deploy performance!
- 5. We then get a good old fashioned String.Equals, the standard .net compare goes: if both strings are not null and the lengths are the same do a check on each byte in the strings
- 6. If the strings are equal we have a match, happy days no more action required
It is what happens if the strings do not match that it starts to get a bit more interesting, if the strings are not equal and any of those four ignore options are True then we then fall down into doing a further comparison but after the scripts have been normalized using the script dom and antlr which is an expensive operation in itself (this also happens to be my next topic!).Once the normalization has been done we end up in the actual compare which goes like this:
- Turn the script into a stream of tokens
- If the token is a comment and ignore comments is set, skip it
- If the token is whitespace and ignore whitespace is set, skip it
- If the token is a semi-colon and ignore semi-colon's is set, skip it
- Then compare the tokens which itself does things like use IgnoreKeywordCasing and removes quotes around quoted identifiers – it isn't a straightforward String.Equals
- If any of the tokens don't match then it is a failure and the script needs to be changed
So what?
So blunt. Anyway, basically what this means is that the default options in the sqlpackage.exe are set to allow things like different cased keywords and whitespace and to allow that we end up taking longer to do deployments where we actually make use of the default features.
huh?
If you have a database with lots of code and you have the code in SSDT but you do things like change the comments when you deploy and rely on IgnoreComments (this is a real life secenario I have seen, someone adding a custom comment header) then you will have slower deployments and as slower deployments are the opposite of what we want you should:
- Have the same code in your database as you have in your project
- Have the same code, including the same case of keywords, comments and whitespace in your database that you have in your project
- Disable the defaults and set IgnoreWhitespace, IgnoreComments, IgnoreKeywordCasing and IgnoreSemiColonsBetweenStatements all to false
What effect does it have?
If your database a project code are exactly the same, then no effect you neither gain nor lose anything.If your database and code are different by comments, case, semi-colons etc and you have lots of files that are different then you will gain quite a bit. On my machine here I created a database with 1,000 stored procedures like "select '—-'" (I used replicate to make it large) I then imported the procs into SSDT and added a space between the select and the text and did a deploy using sqlpackage (in fact I did a few to get an average time), with the default IgnoreWhitespace=true the deploy took about 24 seconds (remember this is on a machine following yesterdays recommendations. lots of memory, fast CPU and SSD) – when I removed the defaults and set them to false – firstly the deploy took 34 seconds because naturally it had to deploy the procs then re-running it took around 17 seconds – about 7 seconds from a 24 second deploy which i'll take.The thing that you will really gain is that your project code and database will be the same which should really be the end goal, if you can honestly say that you have to have:
- Different whitespace
- Different keyword casing
- Different semi-colons
- Different comments
I would be inclined to find out why as it sounds like an interesting project 🙂
Start the discussion at forums.toadworld.com