Retrieve the top X random rows from a query

Let’s suppose for a minute that you want to retrieve the top x number of rows from any given query. Keep in mind that x is a value that your users can supply.

Sure, you could do string concatenation and EXEC sp_executesql ‘SELECT TOP ‘ + @count + ‘ x FROM y;’. But, then again, a vicious hacker could get all cute and decide to change your form variable to ‘; DECLARE @bad_var AS NVARCHAR(MAX); SET @bad_var = ””; SELECT @bad_var = @bad_var”DROP TABLE ” + name + ”;” FROM sys.tables; SET @bad_var = @bad_var + ”–”; EXEC sp_executesql @bad_var;’ Or something along those lines but probably infinitely more cunning.

Well, you could also accomplish this a different way using a subquery:

DECLARE @count AS int;
SELECT @count = 2;

SELECT ContactId,
       FirstName,
       LastName,
       EmployeeId,
       AddressId,
       AddressLine1,
       AddressLine2,
       City,
       PostalCode
FROM (
  SELECT ROW_NUMBER() OVER (ORDER BY NEWID()) AS rownum,
         c.ContactId,
         c.FirstName,
         c.LastName,
         e.EmployeeId,
         a.AddressId,
         a.AddressLine1,
         a.AddressLine2,
         a.City,
         a.PostalCode
    FROM HumanResources.Employee AS e
          INNER JOIN Person.Contact AS c
            ON e.ContactId = c.ContactId
          INNER JOIN HumanResources.EmployeeAddress AS ea
            ON e.EmployeeId = ea.EmployeeId
          INNER JOIN Person.Address AS a
            ON ea.AddressId = a.AddressId
) AS x
WHERE rownum <= @count;

It’s not the prettiest thing on earth, but it works. Thanks to Rick Kierner over at rickdoes.net for asking me to help out with a similar query late last week.

Comments

4 Comments so far. Leave a comment below.
  1. you can also use SET ROWCOUNT n, still works in 2008 but will be deprecated in the next version I believe, make sure that it is set to 0 after the query

    declare @id int
    select @id = 5

    set rowcount @id

    select * from sysobjects
    order by newid()

    set rowcount 0

  2. I just found this:
    DECLARE @Count int
    SELECT @Count = 5

    SELECT TOP (@Count),
    c.ContactId,
    c.FirstName,
    c.LastName,
    e.EmployeeId,
    a.AddressId,
    a.AddressLine1,
    a.AddressLine2,
    a.City,
    a.PostalCode,
    at.Name AS AddressType
    FROM HumanResources.Employee AS e
    INNER JOIN Person.Contact AS c
    ON e.ContactId = c.ContactId
    INNER JOIN HumanResources.EmployeeAddreess AS ea
    ON e.EmployeeId = ea.EmployeeId
    INNER JOIN Person.Address AS a
    ON ea.AddressId = a.AddressId
    INNER JOIN Person.AddressType AS at
    ON a.AddressTypeId = at.AddressTypeId

    Is this better/worse/same?

  3. It actually turns out that the execution plans are almost exactly the same (in this case). Using TOP (@Count) is slightly more efficient within this scenario in AdventureWorks (the plan cost for my query is 0.412827 and for TOP it is 0.412769). The difference is made up by some additional operators and a filter that I’m using to replace the functionality of TOP via my use of ROW_NUMBER and the WHERE clause.

    Which brings up the point that we’re always supposed to be using the parens with TOP and I just found that out this morning while reading Books Online.

Add Your Comments

Disclaimer
Your email is never published nor shared.
Required
Required
Tips

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <ol> <ul> <li> <strong> <p>

Ready?

This site is protected with Urban Giraffe's plugin 'HTML Purified' and Edward Z. Yang's Powered by HTML Purifier. 226 items have been purified.