SQLintersection session – Top Customer Problems: How to Avoid Calling Bob

November 22, 2014 Allison Benneth-Hansen Leave a comment

I attended the SQL intersection conference in November 2014 in Las Vegas, Nevada. I am including my notes from the sessions in this blog, primarily for my own reference later. For anyone that comes across these notes, please take them with a grain of salt – I may very easily have misheard something or have written it down incorrectly.

Session Title: Top Customer Problems: How to Avoid Calling Bob
Instructor: Bob Ward

Biggest issues (in order)
- Performance
  - Is it a “running” problem (high CPU) vs a “waiting” problem (IO bottleneck, blocking, latch waits, other waits)?
    - Blocking – usually a problem within the app
    - IO bottleneck (PAGEIOLATCH or WRITELOG)
    - Latch waits, hotspots (PAGELATCH_XX)
    - RESOURCE_SEMAPHORE waits
    - ASYNC_NETWORK_IO can be a network problem, but usually a problem with client result processing
    - Other waits, check DMV
  - If the problem is current: Performance Dashboard Reports and DMVs
    - Performance Dashboard Reports
      - DMVs only
      - Create SP in master
      - 2005 version (use for 2005, 2008, 2008R2)
      - 2012 version (user for 2012, 2014)
    - Check for other process using lots of resources
      - High kernel time (i.e., anti-virus)
    - Within SQL Server
      - Background/system tasks
      - High recompiles
    - Query
      - Usually about cardinality estimation
        
        Many root causes, see table
        
        Tables variables (estimate = 1)
        
        TF 2453 (2012 SP2+)
        
        Data type mismatch (cannot make estimate, change data types)
      - Poor/missing indexes
      - Query design (lots of scans)
      - Lots of parallelism (check DOP)
    - Post-mortem problems: PSSDiag, SQLNexus
  - Backup/Restore
    - Backup
      - OS error 64 – Network problems (backup started, but connection dropped)
      - OS error 1450 – OS memory problems (backing up to a remote drive)
      - Transaction log growth (full recovery with no t-log backups)
      - VDI backup failure (3rd party app problems)
    - Restore
      - Disk bottleneck
      - Long-running recovery
      - VLF fragmentation
      - CHECKSUM failures
        
        Use WITH CONTINUE_AFTER_ERROR to override
      - Consider restoring to a different DB vs replace (if error while restoring, still have fallback)
    - Consistency Issues
      - Message 824 error / DBCC CHECKDB
      - Checksum errors
      - Suspect DB
      - No “magic tricks” to fix
      - Almost always IO path error (try SQLIOSIM)
      - Can also be memory corruption
        
        Constant page sniffing – thread that randomly check memory pages
      - Could be a SQL bug (replay the log)
      - Session id of -3 = recovery (deferred)
      - Page-level recovery
    - Stackdumps/Exceptions
      - Verify that the faulting module is SQL
    - Obscure errors
      - Message 1204 – Deadlock
      - Message 833 – IO delay warning (15 sec IO delay)
      - Message 17883 – Non-yielding scheduler (almost always a SQL bug)
      - Message 17884, 17888 – usually a long blocking chain
      - Message 844, 845, 846, 847 – latch time out
      - Use error_reported XEvent to track down a specific error
    - Startup problems
      - Permissions problem (may have changed service account followed by a hard restart)
      - Config (i.e., max server memory)
      - Damaged files
        
        Keep file copies of master (shut down SQL, copy file off) and binaries
      - Backup tail of model causes restart problems
    - Memory issues
      - Out of memory
      - Memory grown (OK if below max server memory)
    - Hidden gems
      - Sytem health session persisted (2014)
      - NUMA guidance
      - System Center Advisor

tempdb bulk operation optimized (fix for 2012, built in to 2014)

Uncategorized

SQLintersection session – In-Memory OLTP Overview

November 22, 2014 Allison Benneth-Hansen Leave a comment

Session Title: In-Memory OLTP Overview
Instructor: Kevin Farlee

Hekaton = In-Memory OLTP
Columnstore = In-Memory DW
CPU clock rates stalled out about 2005, but more power now coming in the form of multiple cores
Memory costs continue to plummet
Saying “It’s fast because it’s in memory” doesn’t really cover the Hekaton story
- Can add enough memory to load file-based tables into buffer pool
- Hekaton has many more speed optimizations
Focus on data hotspots, and load that data into Hekaton. Doesn’t have to be the entire dataset or even a significant portion of the dataset
Compiled stored procedures don’t have to context-switch (but still subject to quantum)
Cannot now replicate from Hekaton to another system, but can replicate from a disk table to a Hekaton table
Based on optimistic concurrency – assume that no one else will access simultaneously
Validation errors cause rollback; have to treat like deadlock in code (i.e., retry)
- SQL does retry in some very limited conditions
Works with Always On, AG, backups, etc.
Not recommended for work queues
Can enable delayed durability on Hekaton tables
If run out of memory, can still delete rows, but cannot update or insert
Indexes are not persisted on disk; they are always built in memory
SCHEMA_ONLY does not even write to transaction log (other than table creation)
Case example of online gambling company that accessed web session state from SQL
- Using Hekaton, went from 15,000 txn/sec to 250,000
Uses more efficient logging to improve write throughput
Need to set bucket count an order magnitude (or two) larger than expected row count
Tables can be schema and data or schema only
Can create natively compiled SPs
- Calls out to C compiler to create
- Entire SP is atomic
Some transactions cannot run in read committed
- Must use snapshot, repeatable read, or serializable
Cannot comment on whether this will work its way into Azure, but wouldn’t be surprised
Don’t necessarily go big bang – convert tables a few at a time until goals are achieved

Uncategorized

SQLintersection session – Parallel Query Execution, Part 2

November 22, 2014 Allison Benneth-Hansen Leave a comment

Session Title: Parallel Query Execution, Part 2
Instructor: Paul White

Batch mode parallel execution

- Geared toward DW scenarios
- Row mode model not as efficient on modern CPU architecture
  - Frequently has to access memory rather than L1, L2, L3 cache
- Batch mode
  - Batches contain 64-900 rows
  - Code designed to fit on L1 cache
  - Data sized to fit on L2 cache
  - Geared toward 64-bit register use
  - Uses push model (whereas row mode is a pull model)
  - Avoids expensive exchanges with low-contention shared structures
  - Has high startup cost, so not good for OLTP workloads
  - Threads are dynamic and can move between operators
- Batch is a 64KB structure designed to fit in L2 cache
  - Contains a column vector that fits in L1 cache
  - Allocated one time, reused many times
- TF 9453 – Disable batch mode
- Must keep columns small so that at least 64 rows will fit into 64KB structure
- Batch mode not implemented on string aggregates
- Columnstore
  - High compression, reduces IO
  - Stored as LOBs, optimizations for LOB
  - Created by partitioning into row groups of 1024^2 (about one million)
    - Partitioned into segment; reordered for compression
    - Dictionary per column per segment, plus global dictionary per column
  - SQL 2012 had many limitations, including being read-only
    - Also had very limited plan operators that were supported
    - Small changes to query could change to row mode and be much slower
    - Can often rewrite query to stay in batch mode, but can get ugly and requires a fair bit of internals knowledge – not well suited for end users
    - If operators spill to tempdb, query could start over in row mode
  - SQL 2014 has many improvements
    - Writeable, updateable, better compression, DDL allowed, more column types
    - More operators added, but still limited
      - No sort still
    - Spills stay in batch mode
    - Allows mixed mode (i.e., plan contains row mode, switch to batch mode, back to row mode) leaving more options for optimizer
    - Improved bitmaps
      - Any supported datatype
      - Both simple and Bloom filters
    - Bookmark support (this is not clustered index/RID lookups)
      - Unique key for columnstore indexes
    - Delta store for inserts; tuple mover to merge when full
      - Deleted rows stay in columnstore (marked as deleted) until REBUILD
      - Update = delete + insert
    - Still has limitations
      - No PK, FK, nonclustered indexes, unique constraints (can work around with materialized views), replication, CDC
      - Not readable on AG secondaries
      - (Interesting note – bug in error message when attempting to create FK – error message lists wrong table!)
    - TF 8607 – Output text of internal execution plan
    - TF 3604 – Output to client
    - TABLESAMPLE SYSTEM (15 PERCENT) – sample 15% of a table

Uncategorized

SQLintersection session – Parallel Query Execution, Part 1

November 22, 2014 Allison Benneth-Hansen Leave a comment

Session Title: Parallel Query Execution, Part 1
Instructor: Paul White

Within a parallel plan, each thread is effectively running a serial plan

Parallel seeks and scans
- 2005 and earlier use parallel page provider
- 2008+ use range enumerators
  - May see ACCESS_METHODS_DATASET_PARENT latch waits
- Only forward scans and seeks are supported
  - Backward scans can only happen serially
- Exchange types
  - Hash – Distributed based on hash function
  - Round Robin – Sent to next available consumer
  - Broadcast – Pushed to all consumers
  - Demand – Row at a time (rare)
  - Range – Mostly index build plans
- Exchanges can be sort order preserving or not
- Root node has property called ThreadStats
  - Will show number of threads reserved and number of threads used
  - May not use all reserved threads depending on timing of thread allocated in the query (can reuse threads that are already complete)
- exec_context_id is the thread number
  - 0 = serial
- Query to identify node id where CXPACKET wait may be happening
- Parallel merge join requires order preservation
  - Does not scale well to high DOP
- Parallel hash join
  - Generally scales well to high DOP
  - Bitmap reduction – can only appear in parallel plans, and can greatly reduce work to be done (i.e., early reduction)
- Parallel nested loops join
  - Inner side runs serially
  - Optimizer is “biased” against them
  - TF 8649 – Force parallel plan
  - Various tricks or plan guides to force usage
    - Such as specifying row goals
    - Express queries as APPLY – As query is pushed toward parallel nested loop queries, can push toward smaller memory grants and more efficient use of resources
  - Parallel sorts
    - Observer memory fraction percentage
    - Tricks to avoid spills
      - Cast strings to larger strings (i.e., select cast(str as varchar(100))) to force larger memory grant
      - Split sort into smaller sorts (initial select into temp table, then cross apply to temp table)
    - Avoid too much parallelism – set the cost threshold for parallelism
    - Parallel deadlocks
      - Once deadlock is detected, contents of CXPACKET buffers spill to tempdb
      - Interesting crafted demo that causes spills only at even DOPs (but not with odd)!

Uncategorized

SQLintersection session – Deploying SQL Server with Microsoft Azure Virtual Machine

November 22, 2014 Allison Benneth-Hansen Leave a comment

Session Title: Deploying SQL Server with Microsoft Azure Virtual Machine
Instructor: Bob Ward

Models

- Azure portal (pay-as-you-go)
- Bring own VM (use own license)
- Management Studio (pay-as-you-go)
- Azure VM Templates (capture Azure VM) (pay-as-you-go)
Gallery images
- Optimized for OLTP or DW
VM Sizing
- A series
  - Don’t use basic tier except for testing
  - A2+ for Standard Edition
  - A3+ for Enterprise Edition
  - A4, A7-A9 optimized
- D series (faster CPU, SSD)
  - D4, D13-14 optimized
- Can be changed later
Cloud services for things like Availability Groups, Availability Sets, load balancing
See MS decision tree for virtual networks
Must create TCP endpoint for MSSQL port (1433)
Takes about 10 minutes to provision
Can provision via SSMS (Tasks, Deploy to Azure VM)
Be sure to connect via RDP right after provisioning
Storage decisions
- 500 IOPS/drive limit, but can span across drives
- Use Windows Server 2012+ to create storage pools
  - One for data, one for log
  - Span across multiple disks
- Be sure not to use system drive ever (including for system databases)
- Do not enable Azure options for caching on storage drives (disabled by default)
- D drive – do not use on A series
  - On D series, use D drive (SSD) for tempdb or BPE
- Enabled instant file initialization
- Enabled locked pages
- Turn off geo-replication (doesn’t guarantee write ordering, can cause transactional inconsistencies)
- AZCOPY – like robocopy for Azure
- Apply fix for tempdb eager writes (do this both on-premise and in Azure) for SQL2012
- Disable SSAS, SSIS, SSRS if not used
- Check that SQL authentication is enabled
- Windows firewall – open up 1433 or SqlServer
- Consider ACLs on TCP endpoints
- Optimized images
  - 12 TB storage pool for data
  - 2 TB storage pool for log
  - Startup –E
  - TF 1117, 1118
  - 8 tempdb files
  - New DB appears to be on c: drive, but is on a mount point
  - Instant file initialization
  - Locked pages enabled
- Disk transfers/sec on perf monitor to measure IOs
- Configuration of Availability Groups
  - 3 primary options
    - All in Azure, same datacenter
    - Secondaries in different datacenter
    - Primary on-premise, replica in Azure
  - Requires DC in Azure
  - Coming soon: Always On template (about 5 images)
  - Add Replica Wizard in SSMS (requires hotfix)
  - Don’t try to upload AG VMs into Azure – won’t work and will have many problems
- No failover clustering options (yet)
- Can back up on-premise to Azure storage (2012+)
- Can store data and log files on Azure storage (2014+)
- Performance considerations
  - Batch transactions when possible
  - Autogrow considerations for log
- Do not use autoscale for Azure
- Future
  - G Series coming soon (more cores, RAM, larger SSDs)
  - Premium storage
  - Azure file storage (shared file system)

Uncategorized

SQLintersection session – Squeezing the Most out of Row and Page Compression

November 22, 2014 Allison Benneth-Hansen Leave a comment

Session Title: Squeezing the Most out of Row and Page Compression
Instructor: Andrew J. Kelly

A highly underutilized feature in SQL Server

- Row compression
  - Fixed length columns stored as variable
  - Does not compress BLOB/LOB (i.e., LOB pages)
  - Storage format is a superset of vardecimal
  - Overhead for more metadata (i.e., variable lengths), but optimized for this
  - Recommended for high update systems
  - Overall better compression than page compression
- Page compression
  - Does row compression first
  - Once page is full, further compression based on column and page dictionary
  - Also no BLOB/LOG (LOB pages)
  - Recommended for high read workloads
- Can be enabled at table, index (clustered and/or nonclustered), partition level
  - Partition can be some or all
- sp_estimate_data_compression_savings
  - Copies sample into tempdb and compresses it; extrapolate to entire table
- Can be enabled or disabled online (essentially does an online rebuild)
- Nulls and zero stored as four bits
- No trailing spaces on char columns
- Column prefix
  - Determine common prefix on columns
  - Even on binary and integer data
  - Stores prefixes, then uses pointers to prefixes
- Page dictionary
  - Find repeating columns
  - Store dictionary, use pointer to dictionary entries
- If bulk-inserting or inserting into a heap with page compression, will only get row compression
  - Unless TABLOCK is specified
- May increase size of compressed backups
- Don’t compress small objects without a good reason

Uncategorized

SQLintersection session – A Tour through C# and Visual Basic Language Changes

November 22, 2014 Allison Benneth-Hansen Leave a comment

This is one of two development session I attended.

Session Title: A Tour through C# and Visual Basic Language Changes
Instructor: Kathleen Dollard

Visual Studio v14 is coded in native language (C#, VB)

- Open source, transparent, etc.
- Language extensibilities through analyzers
- Language features have cleaner semantics
  - Some of these are in flux and may not happen
  - public int Id {get;} [=1;]
    - Can only init during construction; or, have initializer in code
  - Remove class name from static method calls
  - Initializers for dictionaries
    - new Dictionary<string, Customer> = { [“A”] = new Cust(…), [“B”] = new Cust(…) }
  - await in catch/finally blocks
  - Conditional catch (exception filters)
    - catch (Exception ex) if (ex.Code == 4) { … }
  - Expression body members
    - public string FullName => FirstName + “ “ + LastName;
  - Null propagation (aka null conditionals; sometimes call Elvis operator!)
    - x ?. x.Y
    - Evaluates to null if x is null, else evaluates to x.Y
    - Example
      - If (Event != null) { Event.Invoke (…); }
      - Now becomes: Event?.Event.Invoke (…)
    - array ?[index]
      - Checks if index has value (?)
    - String interpolation
      - Format(“{0} {1} {2}”, x, y, z)
      - Can now be written as $”\{x} \{y} \{z}” (maybe will require \ prefix instead of $; still in limbo)
    - nameof
      - void method(int abc) { if (abc < 0} throw new ArgumentOutOfRangeException(nameof(abc)); }
    - Visual Studio features
      - Breakpoint peek
        
        Debug breakpoints (conditions, actions)
        
        Tip: set hit count = 999999, will show actual hit count on exception
        
        Tip: if breakpoints don’t hit, close VS, delete sou files and restart VS
      - Performance tips
        
        Stepping through code will show elapsed time for running
      - Name refactoring
        
        Ctrl RR while on an identifier will change all references
      - Other refactoring
        
        Select code block, extract block
      - Touch to zoom or scroll
      - Improved coloration (i.e., hover tips on identifier)
      - Word part intelligence (ArgExp will find ArgumentException)
      - Proper case menus

Peek in XAML

Uncategorized

SQLintersection session – Anatomy of a Data Modification: Locking and Isolation

November 22, 2014 Allison Benneth-Hansen Leave a comment

Session Title: Anatomy of a Data Modification: Locking and Isolation
Instructor: Kimberly Tripp

Versioning is a powerful and underused mode that can solve many problems
SQL uses locks on resources as indicators (i.e., intent shared, intent exclusive)
Use SSD for most random IO drives; for server consolidation scenarios, this may be a common log drive
SQL Server can be in 1 of 4 states at any given time
- Locking only
- Locking + versioning
  - RCSI (statement-level read consistency)
  - Snapshot isolation (transaction-level read consistency)
  - RCSI + snapshot isolation
- Lock compatibility
  - X – not compatible with IX, IS, S
  - IX – not compatible with S; is compatible with IS
  - IS – compatible with S
- dm_exec_sessions: isolation level + session setting
  - 1 = Read uncommitted
  - 2 = Read committed (no distinction between locking vs. versioning)
  - 3 = Repeatable reads
  - 4 = Serializable
  - 5 = Snapshot
- Whitepapers
  - Working with tempdb in SQL Server 2005
  - SQL Server 2005 Row version-based transaction/isolation

Uncategorized

SQLintersection session – Performance Troubleshooting Using Waits and Latches

November 22, 2014 Allison Benneth-Hansen Leave a comment

Session Title: Performance Troubleshooting Using Waits and Latches
Instructor: Paul Randal

Keep historical snapshots of wait stats
See 2014 whitepaper on using wait stats
- Or older, more in-depth 2005 whitepaper [.doc]
dm_os_schedulers; interesting columns
- visible_online, visible_online (dac), hidden_online
- context_switch: number of context switches
- loads
- quantum_length_us: length of the scheduler’s quantum in microseconds (time until the thread must yield)
Waiter list is completely unordered. Each thread will be notified when its resource is available, and can be notified in any order. It is then moved to the runnable list
No limit to how many threads are on the waiter list
No limit to how long a thread can be on the list
When a thread’s quantum is exhausted, it is automatically moved to the runnable list
- “Check for abort or yield” function called by thread – SQLOS will notify if quantum is exhausted or if the user has cancelled
- NON_YIELDING_THREAD error is always a bug. Must bounce SQL to kill it
- Must yield even if no other threads are on the scheduler
Threads on the waiter list do not periodically wake up to see if resource is available
When a thread owning a lock releases the lock and another thread is waiting on the resource, it transfers ownership to the second thread and moves it off the waiter list. That is, the thread is “signaled.”
Time that a thread sits on the runnable queue is called signal wait time. This is usually very small and generally not of interest.
Runnable queue is ordered as first-in, first-out
- Special case: Resource governor allows for threads to be given priority. However, this is hard to set up and nobody uses it
dm_os_schedulers.runnable_tasks_count. If this amount is large (perhaps >= 10 per scheduler) this indicates CPU pressure
Waits happen. It is simply a part of SQL Server.
Wait time = resource wait time + signal wait time
dm_os_waiting_tasks – What is happening right now?
dm_os_wait_stats – What has happened
- Aggregate view of wait stats since they were last cleared
- Capture snapshots over time. Every 15-30 minutes should be sufficient
- Figure out how long to retain
Coming by about the end of the year: sqlskills.com will have a repository to wait types! (and latch types)
Capture waits via extended events: sqlos.wait_info event
A couple of demos on collecting XEvent, including SQL Server call stacks!
Latch whitepaper [.pdf]
Superlatches – efficiency mechanism on NUMA nodes
- Can produce benign error message in log: “Failure to allocate super-latch promotion threshold” – nothing to do about it
dm_os_latch_stats: Ignore unless wait stats show latch waits
Clearing wait/latch stats: DBCC SQLPERF
- When were they last cleared? See here
Spinlock – synch mechanism that is lighter than a latch
- Uses CPU-level instruction to get a bit
- Tries a number of times (500 to 1000); if still cannot acquire will backoff and sleep on the CPU, then try again
- Does not yield thread, so runnable threads have to wait –high signal wait times will be the result
- In practice, this does not result in a SOS_NONYIELDING thread scenario (which takes about 60 seconds to come up)
- Rare to see, but can happen
- Diagnosing requires lots of internals knowledge; see this whitepaper [.pdf]
Common wait types
- CXPACKET – parallelization
- OLEDB – Linked servers or DMVs
- ASYNC_NETWORK_IO – TDS packets sent to client; awaiting ack.
Common latch types
- ACCESS_METHODS_DATASET_PARENT – multithreading where worker thread is getting next set of work
- ACCESS_METHODS_SCAN_RANGE_GENERATION
- ACCESS_METHODS_HOBT_COUNT
Wait types
- PAGEIOLATCH_XX – Waiting for data to be read into memory
  - XX = SH – Page to be read
  - XX = EX – Page to be written
- Determining which table a page belong to
dm_os_buffer_descriptors – one row per page in the buffer pool
dm_os_virtual_file_stats
- For each DB file – IOs, etc.
- Snapshot this table from time to time
PAGE_LATCH_XX
- Waiting for access to memory page
- Often indicates tempdb contention
- TF 1118 – Removes SGAM contention
  - Also, TF 3226 – Don’t write backup success messages to the log
  - TF 4199 – changes query opt., not necessarily recommended for all systems
- For PFS contention, add move tempdb files
  - If logical cores < 8, tempdb files = number of logical cores
  - Else use 8 tempdb files; if contention in tempdb not alleviated, add 4 more at a time until problem is alleviated
- May also indicate excessive page splits or an insert hotspot
LCK_M_XX wait
- Analyzing – see this Michael Swart blog post
- Especially look for LCK_M_RS_XX; indicates serializable isolation level
- On SharePoint systems consider using snapshot isolation
Row locks escalate to table locks
Page locks escalate to table locks
Row locks never escalate to page locks
WRITELOG wait
- LOGBUFFER wait – copying log block to log cache
- Once log block is written, waits for SPID 1 (the log writer) to asynchronously flush the log to disk; when done, signals the original thread
- Limit on how many log writes per database can be enqueued at once
  - Pre-2012: 32
  - 2012+: 112
- Can be symptomatic of page splits or many small txns
- Delayed durability
  - Only enable is SLA has data loss permissible
  - Can significantly improve throughput on small transactions
- CXPACKET
  - Completely normal on thread 0 (execution_context_id = 0) (controller thread)
  - Only of concern if wait is high on a thread other than 0
  - See Bob Door’s blog post on how to track foreign memory access (foreign pages in performance monitor)
  - Consider increasing cost threshold for parallelism
- SOS_SCHEDULER_YIELD
  - Have to use XEvents to view call stacks to really identify source!
- LATCH_XX
  - Latch class largely controls understanding of root cause; see this resource page
- FGCB_ADD_REMOVE latches
  - Often auto-growth related
- DBCC_XX latches
  - Related to CHECKDB
  - Nothing to do about these
  - Note that persisted computed columns can significantly increase latching (DBCC_OBJECT_METADATA)
- PREEMPTION_OS_XX waits
  - Call-out to Windows
  - XX = name of Win API method; Google for “MSDN XX”
- ASYNC_NETWORK_IO waits
  - Client is likely doing RBAR on the resultset rather than caching the results client-side

Many, many other wait types

Uncategorized

SQLintersection session – Mastering SQL Server Execution Plan Analysis

November 22, 2014 Allison Benneth-Hansen Leave a comment

Session Title: Mastering SQL Server Execution Plan Analysis
Instructor: Paul White

SQL technically has “bags” not “sets”
Binding order: from, on, join, where, group by, cube/rollup, having, select, distinct, order by, top
- Can refer to elements defined in previous items
“Anybody using Hekaton? No, me either. Don’t see any need for it.”
Query optimization pipeline – multiple steps
- NNF Convert information
SQL Server not strongly typed; implicit type conversions can have serious consequences for performance
Constant folding: Evaluate deterministic expressions, such as 1 + 1
OPTION RECOMPILE vs WITH RECOMPILE
- Option recompile is very powerful tool, often worth paying the price to recompile for each value of a parameter
- Does not work if assigning to variable
- Option recompile enables parameter embedding
- Can provide more aggressive (better) query plans
- Substitutes parameters, does constant folding
- Use with caution!
TF 3604 – Output to client
TF 8605
- Outputs tree in message tab
- Particularly useful to view when plan does not optimize as expected
SQL Server does not treat CTEs as a “fence” – i.e., not evaluated separately; rather, rolled into the plan (treated as internal view)
- Even causes view read counts to increase
Simplification
- Eliminate unused columns
- Rewrite logical subqueries as inner/outer join/apply
- Full and outer joins to inner joins (if, i.e., WHERE clause does not allow nulls)
- Remove redundancies and empty expressions
- Primarily targets auto-generated SQL
- WITH QUERYRULEOFF XX
- Can combine query with things like check constraints to further restrict/simplify query
- Computed columns (persisted) can be simplified, but only if expression is in the same order!
- Joins that check for existence but FK exists
  - See MVP Deep Dives Volume 1 for a method of querying against a generic view (Rob Farley) (or here)
  - Unless FK is multicolumn
- Cardinality estimation
  - Uses histograms, density, trie trees (short strings)
  - Uses derived stats
  - Heavily influences initial join order
  - When writing expressions, always ask how the CE will come up with an estimate
- New 2014 cardinality estimator
  - The compatibility level of the context DB controls which CE is used
  - TF 2312 = new, TF 9481 = old
  - Lots of square roots (exponential backoff)
    - Example – join 2 tables
      - t0 = most selective
      - t1 = less selective
      - old: t0 * t1
      - new: t0 * sqrt(t1)
    - “If you want an exciting support life, use linked servers and throw in lots of new features”
    - Trivial Plan Queries
      - Queries without joins only
      - Property on root node of query plan: OptimizationLevel = Trivial vs Full
      - Subqueries prevent trival plans, so can use SELECT (SELECT col) FROM tbl to force full optimization
      - TF 8757 also prevents trivial
    - Cost-based optimization
      - Cascades framework (SearchSpace) – readily available
      - TF 8606 – show tree at different stages
      - TF 8612 – Show cardinality estimate (2012+)
    - Memo groups
      - Logical properties per group are the same (column lists)
      - Physical properties vary – different ways explored for doing the same thing
        
        Sort order, cardinality est., parallelism, execution mode, Halloween protection
      - Visualization: tree with single node, then add variations to grow tree on out
      - Based on applying rules (395 of them in SQL2014)
        
        Most rules are not used most of the time
        
        Small set of rules most of the time
      - Works very well in vast majority of cases
      - Query hints work by disabling rules in the optimizer
        
        FORCE ORDER query hint disables many rules – has a big impact on plan
        
        Other hints imply FORCE ORDER, such as inner LOOP join
        
        OPTION (QUERYRULEOFF)
        
        DBCC SHOWONRULES – List of rules currently enabled. Examples:
        
        JNToNL = logical join to nested loop
        
        LOJNToNL = left outer logical join to nested loop
        
        FOJNnoneqToSM = full outer join, non-equality, to sort merge
        
        DBCC SHOWOFFRULES
        
        dm_exec_query_transformation_stats – shows how frequently rules are used
        
        See Paul’s query Q07 – which rules for test query
        
        TF 8619 (2012+) – also shows applied rules
        
        TF 8621 (2012+) – more detailed
      - Optimization phases
        
        Transaction processing (Search 0)
        
        At least 3 tables
        
        Mostly OLTP queries, small number of rows with nested loops
        
        Quick plan (Search 1)
        
        Subset of rules
        
        If cost exceed threshold for parallelism, run second time with parallel enabled
        
        Full optimization (Search 2)
        
        At least 5 tables
        
        If Search1 resulted in serial, only serial plans searched; same for parallel
        
        All rules available
        
        Costing
        
        Always assumes cold cache
        
        Doesn’t consider specific hardware, mostly (but does account for CPU cores and memory)
        
        Many assumptions that may or may not be true
        
        Costing numbers originally mean an estimate of number of seconds on “Nick’s” machines (dev at MS)
        
        Now completely meaningless for most purposes
        
        Not really valid to compare two queries
        
        Not even valid within a query; i.e., if it says sort of 60% of query, doesn’t make it so
        
        Useful only as a sign post
        
        Plan guides don’t necessarily reduce compilation time
        
        Still runs Search 2
        
        Guide used to shape search
        
        TF 8608 – Shows initial memo groups
        
        TF 8675 – Shows search phase info
        
        TF 8606 – Shows tree info
        
        TF 8615 – Shows final memos
        
        TF 8607 – Output tree from optimizer
        
        TF 7352 – Copy out tree (similar to plan)
        
        Query execution engine
        
        The “true” execution plan is called the MXC (“execution context”)
        
        Far more detailed that the user execution plan
        
        Many internal details are hidden
        
        The building blocks of the execution plan are “iterators”
        
        Have a generic interface
        
        fn_PhysLocFormatter(%%physloc%%)
        
        Returns file, page, slot for row
        
        TF 8666 – Adds some debugging data to plan (see the F4 properties, InternalDebuggingInfo)
        
        Interesting query with 9 different seeks:
        
        LastName in (‘aaa’, ‘bbb’, ‘ccc’) AND FirstName in (‘ddd’, ‘eee’, ‘fff’)
        
        Plan gives only one index seek, but:
        
        Properties show 9 different seek predicates
        
        Physically does 9 seeks
        
        Limited to 64 seek operations
        
        Beware that seeks can be hidden in this way
        
        Filter iterator – predicates than can’t be used in seeks
        
        May have start-up expression: can control whether or not entire subtree is executed at all
        
        Residual – hidden filter
        
        Filter predicate is pushed to the child iterator
        
        Because it is hidden, is easy to overlook
        
        “In a lot of ways, this is the major takeaway of the day”
        
        TF 9130 – Reveal pushed predicates (doesn’t work for join predicates)
        
        Also: If predicate is too large to fit on the tooltip, SSMS will simply leave it off! Have to look at properties window.
        
        Nested loops
        
        Be sure to look at “outer references” property
        
        SSMS shows estimated execution stats based on one loop; actual exec stats on aggregated loops, so they generally don’t correlate
        
        If having performance issues with nested loop joins, be sure to check for residual predicates. May be able to rewrite to push predicate further down the tree
        
        TF 2324 – Disables implications of predicates
        
        Row-mode parallel plans
        
        TF 8649 – Force parallel plan
        
        If a table has a computed column, and the table is referenced in the query (even if the computed column is not referenced), will not get a parallel query (computed column is considered a scalar function)
        
        Batch mode parallel execution – Rather than operating one row at a time, operates on between 64 and 900 at once
        
        Queue up batches; and thread can process next batch
        
        Merge Joins – Check if many-to-many property is true; usually indicates a query error
        
        See Paul’s blog on residual predicates on hash joins (make sure types match)
        
        Bitmap – either simple or Bloom filter bitmap (for parallel plans only)
        
        TOP (n) PERCENT must read the entire set and will include an eager spool
        
        Sorts have a memory fraction property
        
        Spool is a place in tempdb to cache rows in case they may be needed later
        
        Eager spool often indicates a missing index
        
        Merge join operator can be performing a UNION ALL rather than a standard join!
        
        SSMS update operator can actually be updating multiple indexes; view properties to see for sure
        
        Wide vs. Narrow update plans
        
        Wide: Each index updated by separate operator
        
        Narrow: Multiple indexes updated by one operator
        
        Split, Sort, Collapse operators in a row generally indicates that a unique index is being updated
        
        TF 4138 (documented, supported) – Disable row goal behavior
        
        TOP adds row goal behavior
        
        Queries that execute LIKE against a variable can have a Constant Expression that computes at run-time the range begin/end values
        
        GetRangeWithMismatchedTypes (only in XML) indicates a type mismatch that must be converted to a range (i.e., datetime to date)
        
        One way to handle catch-all search queries (rather than dynamic SQL or option recompile) is to use a columnstore index

BreannaHansen.com

SQLintersection session – Top Customer Problems: How to Avoid Calling Bob

SQLintersection session – In-Memory OLTP Overview

SQLintersection session – Parallel Query Execution, Part 2

SQLintersection session – Parallel Query Execution, Part 1

SQLintersection session – Deploying SQL Server with Microsoft Azure Virtual Machine

SQLintersection session – Squeezing the Most out of Row and Page Compression

SQLintersection session – A Tour through C# and Visual Basic Language Changes

SQLintersection session – Anatomy of a Data Modification: Locking and Isolation

SQLintersection session – Performance Troubleshooting Using Waits and Latches

SQLintersection session – Mastering SQL Server Execution Plan Analysis

BEGIN TRAN … COMMIT