September 2008
Mon Tue Wed Thu Fri Sat Sun
« Aug   Oct »
1234567
891011121314
15161718192021
22232425262728
2930  

Month September 2008

Avoiding Cross Database Ownership Chaining

Cross database ownership chaining is bad. Very bad. How bad? Despite being included in SQL Server 2005, this feature is disabled by default. SQL Server 2005 Books Online goes on to say “Setting cross db ownership chaining to 1 is not recommended unless all of the databases hosted by the instance of SQL Server must participate in cross-database ownership chaining and you are aware of the security implications of this setting. For more information, see Ownership Chains.”

Why would you want to use this? Well, you might want to write a stored procedure or function that pulls data from two databases in the same query joining between the two of them.

At my current client, I have two databases on the server — a data mart and a web reporting database. There are materialized report tables in the web reporting database. Some of these tables are only used to provide filters for pulling data from the data mart. Access to the database is very limited — the web user can only execute stored procedures.

This afternoon, after struggling with cross database permission chaining for a day and a half, I posted on sqlservercentral.com’s forums. I was ultimately helped out by a user who suggested using the database option TRUSTWORTHY. I had never heard of the TRUSTWORTHY option, so I had to do some digging to get a better understanding before I suggested it as an option to my client.

What does the TRUSTWORTHY option do?

“When ON, database modules (for example, user-defined functions or stored procedures) that use an impersonation context can access resources outside the database.

When OFF is specified, in an impersonation context cannot access resources outside the database.

TRUSTWORTHY is set to OFF whenever the database is attached.”

TRUSTWORTHY allows databases to use impersonation to access resources in a different database. Here’s an example:

  • There are two databases on the server: data_mart and web_reports
  • There is a server level login: test_user
  • The test_user login is mapped to the test_user user on both data_mart and web_reports

At this point, TRUSTWORTHY is set to OFF. If I try to execute a stored procedure in the web_reports database, I will get an error message that looks like:

The server principal “test_user” is not able to access the database “data_mart” under the current security context.

The only way to fix this is to turn on the TRUSTWORTHY option on both databases:

ALTER DATABASE web_reports SET TRUSTWORTHY ON;
GO
ALTER DATABASE data_mart SET TRUSTWORTHY ON;
GO

After executing the ALTER DATABASE code, stored procedures that cross databases will execute properly.

It’s important to note that, in this situation, the stored procedures will either need to be owned by test_user or else they will need to be set to EXECUTE AS ‘test_user’.

Cursing at cursors

Cursors. Many database developers have to suppress a shudder of horror when they open up a stored procedure only to find a cursor lurking inside. They certainly have their place in the database developer’s arsenal of tools, but they can usually be rewritten as an easily understood set operation.

I’ve spent the first three days at my new client learning the database schema and migrating four load routines from the old to the new database schema. The new schema is, admittedly, easier to understand simply because of the virtue of using design up front principles. But I digress.

These four load processes make use of cursors and, in one instance, nested cursors.

Without divulging too much of the client’s inner workings, the first cursor I encountered looked something like this:


DECLARE @start_date DATETIME;
DECLARE @end_date DATETIME;
DECLARE @end_date2 DATETIME;
DECLARE @account_no INT;

DECLARE my_cursor CURSOR
FOR
  SELECT account_no, start_date, end_date
  FROM Bill;
-- obviously the real query was more complicated than this

OPEN my_cursor;
FETCH NEXT FROM my_cursor
INTO @account_no, @start_date, @end_date;

SET @end_date2 = @start_date

WHILE @@fetch_status = 0
BEGIN
  IF DATEDIFF(dd, @start_date1, @end_date2) > 7
    INSERT INTO some_table (some values);
  END
END;

SET @end_date2 = @end_date;

FETCH NEXT FROM my_cursor
INTO @account_no, @start_date, @end_date;

Essentially, this will insert any records that are at least 7 days older than the previous record. The downside is that, even in my overly simplified example, this isn’t easy to read. When the full query logic is involved this becomes very difficult to read.

I re-wrote the original query using a CTE:


WITH my_cte (id, account_no, start_date, end_date)
AS (
  SELECT
    ROW_NUMBER() OVER (account_no, start_date, end_date, some_values),
    account_no,
    start_date,
    end_date,
    some_values
  FROM Bills
  GROUP BY account_no, start_date, end_date
)
INSERT INTO some_table (some_columns)
SELECT DISTINCT
  account_no,
  start_date,
  end_date,
  some_values
FROM my_cte AS o
INNER JOIN my_cte AS o2 ON o2.account_no = o.account_no
WHERE o2.start_date < o.start_date
  AND DATEDIFF(dd, o2.start_date, o.start_date) > 7

Instead of having to process each record one at a time, the new query is able to process several million records and insert approximately 60,000 records in 11 seconds and is much more readable and reusable as a result.

This CTE pattern has become the basis for several data load procedures that I've re-written from cursors, including a pair of nested cursors.

Special thanks, once again, to my brother for helping me rubber duck my way through this and for providing the initial idea that resulted in using a CTE.

Get dates quickly with SQL

That’s right, now you too can use T-SQL to get dates quickly!

Today at the client I came across the following chunk of SQL in a where clause:

dateadd(m,-24,(cast(datepart(m,getdate()) as varchar(2))+'/1/'+cast(datepart(yyyy,getdate()) as varchar(4))))

It’s used to return the first of the month two years ago. So, for today, this would return 2006-09-01

What’s wrong with this? Well, in this query there are two string concatenations, two explicit conversions from an INTEGER to a VARCHAR, and an implicit conversion from VARCHAR to DATETIME. When run across a large number of dates, this can rapidly become an expensive operation. Especially considering that this was half of a BETWEEN statement that had something similar on the other side of the AND.

Is there a better way to do this? You bet!

SELECT DATEADD(mm, -24, DATEADD(mm,DATEDIFF(mm,0,GETDATE()),0))

This involves no implicit or explicit casting, just some basic date math. While I haven’t done any benchmarking on either option, I’m going to go out on a limb and state that the second option is much more efficient and, arguably, a lot clearer.

In addition to being cleaner, using DATEADD and DATEDIFF makes no assumption about the current locale of your database. This could save you if you are dealing with databases that are spread across the globe or if you have to deal with localized tables within a database.

More information can be found either from Pinal Dave or from Shane Cooper. I’ve bookmarked both of these links and, if you’ve ever had to write SQL code, you should to.

N.B. If you’re using this over any more than a few records you will want to create a variable to store the result of the date math since date functions are non-deterministic. Meaning, if you perform a SELECT on 2 million records, the math will be performed at least 2 million times (4 million if you’re using a BETWEEN).

Edit: Special thanks to my brother for coming to my rescue today and helping me track down an easier, SQL-based, way to do this.

CodeThinked | What is your strategy for becoming a better developer?

My amazing co-worker pointed out a great post from Justin Etheredge on strategies for becoming a better developer.

Self-improvement is a huge thing for me, both personally and professionally. I like both Justin and Rick’s ideas – Emulate, Get Involved, Read, and Stretch.

My strategies tie these all together and are pretty simple, really: Learn and Teach.

Teaching

I’ve gained the greatest understanding of programming by sharing my knowledge with other developers. Teaching has forced me to develop a solid understand of whatever subject I’m teaching.

But teaching doesn’t just involve teaching brown bags or presenting at user groups/conferences. Teaching also includes mentoring colleagues, co-workers, and junior developers. By mentoring, you develop a longer term teaching relationship. You grow along with the person you’re mentoring. Their questions often delve into areas of expertise that you may never have investigated which gets you both learning that subject together and makes use of the collective intelligence to improve your mutual understanding.

Learning

Never stop learning. Learn new techniques in your current language. Learn how your language’s compiler/VM works. Learn about different types of garbage collection. Learn a new language. Just keep learning. Never stop following your passion for technology. Sometimes these technologies may not be immediately applicable to your current project, but learning new concepts will help you approach your current work differently and more efficiently (check out Array.ForEach() in the .NET Framework)

Shingling – it’s not just for roofers!

I was catching up on the Information Architecture Institute mailing list and some feed reader backlog when I came across the concept of shingling. Seeing as how I have never heard of this term in the 8+ years I’ve been working, I decided it was high time that I learned about it.

In essence, shingling seeks to solve the problem of indexing large quantities of data:

  • How can you tell if two pieces of content are the same? You compare them.
  • What happens when you want to compare a lot of pages? You have to make a very large number of comparisons.
  • What happens when you’re google? You can’t solve this problem by bulk comparisons

The number of computations required to compare all of your content rapidly trends toward infinity. Enter shingling.

Shingling is nothing more than taking a single document, splitting it into smaller chunks, and generating a sufficiently sized unique fingerprint from the shingles. This way, when you are indexing content, you can immediately break down the document into the constituent shingles, generate a fingerprint, and then see if that fingerprint already exists. What you do at this point is up to you.

I ran into a similar problem when implementing an n-gram search engine that ultimately proved to be inefficient. Essentially, the computation required to generate the n-grams for a given page of content, check the index for an existing n-gram, and associate the indexed word/content chunk with the given n-gram soon proved terribly inefficient. This n-gram search engine only had to index names, addresses, and email addresses — it wasn’t even attempting to provide an index of user generated content.

I have to wonder how full-text indexing providers, like SQL Server and Ferret, will (or already have) handle the challenge of shingling indexed content. It seems like it would be a concern both on the front of storage consolidation and for the purposes of optimizing CPU cycles — for retrieval, indexing, and comparison.

References
Near-duplicates and shingling
Navigating the network of knowledge: Mining quotations from massive-scale digital libraries of books
N-gram, From Wikipedia, the free encyclopedia

Navigating Code with VS2008

Here are a few keyboard shortcuts in Visual Studio 2008 that I’ve recently (re)discovered:

  • CTRL + - Move to the last position in code.
    This is very helpful if you’ve been jumping through a file while searching for a particular function, variable, whatever.
  • CTRL + SHIFT + - Just like the above, but in reverse.
    This moves forward through code, if any forward position is available. Otherwise you’ll just wonder why you’re pressing CTRL + SHIFT + – over and over.
  • CTRL + K, CTRL + K Add a new bookmark.
  • CTRL + K, CTRL + N Move to the next bookmark in the file.
  • CTRL + K, CTRL + P Move to the previous bookmark in the file.
  • CTRL + K, CTRL + L Clear bookmarks
  • CTRL + K, CTRL + C Comment line(s).
    Comment out the current line or selected lines. Take note, this will not put a comment at the beginning of empty lines, which drives me nuts.
  • CTRL + K, CTRL + U Uncomment line(s).

The bookmark functionality is great when you’re focusing on refactoring several common code blocks that have subtle differences and you want to make sure you account for all edge conditions when you’re combining them into a single method.

Thinking about process

I’ve often heard (and probably said on at least one occasion) words like “We don’t have time for code review” or “The deadline is just too short to for unit tests.” These are words that run directly against meeting deadlines.

Sure, it’s possible to meet deadlines without continuous integration, unit tests, code review, or any other quality assurance mechanism. But what is the cost? Over time, bugs will creep into the code. Bugs that might not be found until code is in production. Bugs that might be, potentially, very time consuming to fix.

The initial cost of setting up continuous integration, of doing test first (or test driven) development, of weekly code reviews comes out to be much less than the cost of plowing headfirst into development efforts and fixing bugs later, when they’re found.

Defensive programming practices are, ultimately, agile practices. They force us to think about what we’re doing. By following process and thinking clearly about the choices we make when we’re developing our code we can ensure that the final product is largely free of defects and can be easily maintained and repaired when a defect is found.

Creating HTTP Redirects in IIS6

We’re putting the finishing touches on a new web site for a client. The new pages are in place on the test server and load testing is about to begin. Everything is going great. “Oh, we almost forgot to mention that every URL from the previous site has to be permanently redirected to the new site.”

Knowing that this is easy in Apache, I agreed that this could be quickly accomplished with IIS and that we’d be able to get this in place before the site goes live in two weeks.

Unfortunately, I was mistaken. This is not easily accomplished in the IIS control panel, and impossible when the URLs do not exist. Instead, it’s necessary to use the IIS Metabase Explorer from the IIS 6 Resource Kit.

Download and install the IIS 6 Resource Kit Tools from Microsoft (Internet Information Services (IIS) 6.0 Resource Kit Tools). Once you have installed it, launch the installer and walk through the installation wizard, just like you would for anything else.

You should end up with multiple new menu entries:
IIS 6 Resource Key - New Menu Options

Fire up the IIS Metabase Explorer and navigate to your local website. In my case I used the root website, so I opened up [MACHINE NAME]\LM\W3SVC\1\ROOT. So far, this should look similar to what you would see in the IIS Administration Console.

Right click on ROOT and select New > Key. A new key equates to either a file or directory – at least for the purposes of this tutorial. Name this key facility9.html – we’re going to send me some more traffic. Once you have created the key, right click on it and select New > String Record

A new window will pop up for the new record’s details. Select HttpRedirect from the Record Name or Identifier combobox and click OK.

You should now have a new, empty, string record sitting inside your facility9.html key.

Double click on the new, empty, record to bring up the HttpRedirect Properties window. In the Value Data text box, type ‘http://facility9.com,PERMANENT’, without the quotes (obviously). This will set up a permanent redirect in IIS from http://yourwebsite.com/facility9.html to http://facility9.com. Switch to the General tab and set the User Type to File. This forces IIS to respond to your HttpRedirect as if it were an actual file. If you don’t set this, you will get a 404 error.

You can find more details about the HttpRedirect Metabase Property (and other metabase properties) from the Windows Client Center of TechNet: http://technet.microsoft.com/en-us/library/cc757144.aspx.

Fast VPC boot times

Sick and tired of virtual machines booting slowly? Yeah, me too. I’m impatient.

Apparently Greg Low was even more annoyed at it than I am and wrote a post about Virtual PC performance using flash drives.

Essentially, use a sizable flash drive to make your Virtual PC boot twice as fast. 2x the fun! 2x the productivity! 2x the USB slots!

This site is protected with Urban Giraffe's plugin 'HTML Purified' and Edward Z. Yang's Powered by HTML Purifier. 219 items have been purified.