SQLPeople the First

This past weekend I had the honor and pleasure of speaking at the first SQLPeople event in Richmond, Virginia.

The Back Story

Back in February, Andy Leonard asked me to speak at a new event. The idea behind his new event was inspiration. Instead of focusing on educating others, Andy asked us to share our own inspiration. Instead of presenting a seminar or training course, the idea was to talk about my work, my vision, and my passion for database technology (to steal Andy’s own words).

It didn’t take me long to say “Yes!” It’s not every day that I get asked to talk about myself for more than 30 seconds, much less be openly invited to talk about myself with a group of people, slides, a projector, and a microphone. Can you imagine that? 40 minutes of nothing but me? Already, I could see that Andy was a man of vision.

My Original Idea

My original idea was to talk about federated databases, SQL/MED, and hybrid data. I’m really glad that Andy asked me for my abstract a second time; once he did, I couldn’t work those ideas together into anything vaguely resembling a coherent narrative.

The funny thing about inspiration is that it’s different from interest. I think federated databases are interesting; I’ve written about them enough. I think that SQL/MED is really cool; that’s how I dug into federated databases. I certainly think that hybrid data/polyglot persistence/buzzword du jour is an interesting idea; I’m talking about it at Stir Trek.

Ultimately none of these things get me excited at a base level: they’re interesting but not intriguing.

The Evolution

I sat down in front of the computer and brain stormed. I don’t think the idea really gelled until I was furiously re-typing my abstract over and over again. As I wrote, the core idea changed a number of times. Free-writing is a wonderful tool, but it’s incredibly painful when you’re trying to write something that you promised to a dear friend. Especially when you realize how late you are.

As I kept iterating over my original ideas I realized that I wasn’t excited and moved by them. I just kept typing, though. I know from experience that I can iterate through ideas over and over again and eventually get to something great. Turns out that’s exactly what happened. After slapping at the keyboard for a while, I cranked out this beauty:

If you asked Jeremiah Peschka to pick three things he’s interested in about computers, he’d say “data” and then look at you funny. If you asked him again, he’d saying “data design, database design, and designing around the limits of the first two.” This is a rapid tour of the building blocks of databases, how those choices affect what we do with data, and why we have to break the rules from time to time to get things done.

The Talk

I had a lot of fun talking about how different aspects of software design, database design, and hardware design influence the design of databases. The talk covered some of the theoretical underpinnings of databases at a very high level, it only scratched the surface of some of the things that I wanted to discuss. Computer science is such a broad field and even a small piece of it like database design can contain a huge amount of information that it’s difficult to pick and choose the relevant parts.

The twenty minute format was a welcome limitation; it forced me to focus on what was most important in my topic. I had to focus on what inspires me to keep learning and what inspires me to share part of my journey with the other people.

Thank you to everyone who put on the event and everyone who attended. It was a great opportunity to share the things I enjoy about this field.

The Future

You’d think things would end here, after all I gave a talk about my inspiration once, right?

I had so much fun putting the talk together and got so much great feedback from the attendees that I thought I would refine the short talk into a longer version. There’s a lot of stuff that I left out of the presentation. It’s not that it wasn’t relevant, but I had to trim material so I could finish the talk in 20 minutes. I want to revisit the talk an add more material; there’s so much interesting information out there that it seems a shame to not share it.

Meme Monday: Eleven Words or Less

For best performance, use RAID 10. Always.

Tagged: Erin, Jes.


Tom LaRock challenged me to write an 11 word or less blog post. I decided to offer some succinct advice as well as tag two people, all in under 11 words.

Building Our Own Federated Database

We’ve already talked about The Promise and Failure of Federated Databases and Why Don’t We Have Federated Databases. At the end of the second post I concluded that the only real way to solve this problem is to build the federated database ourselves. Before you ask, “Does he really want us to roll our own database” take a deep breath and relax; nobody is going to be writing a database.

What is a Federated Database?

When I was looking at the definition for a federated database, it dawned on me that a federated database doesn’t have to be an actual RDBMS, or any other type of existing DBMS. A federated database is a meta-database management system (or so Wikipedia claims). Looking at the other major explanation of a federated database we end up with “A federated database architecture is described in which a collection of independent database systems are united into a loosely coupled federation in order to share and exchange information.”

I’ve already talked about the implications of the first definition – it’s leading us down the path of a monolithic master server that must be aware of the other servers in the federation. New servers won’t be a part of the federation until we make the federating server aware of them. The other road, a loose collection of independent servers, is beginning to gain ground as companies bring more databases online in their data centers. When I say “more databases” I’m not just talking about a larger number of databases from a vendor, I’m also talking about databases from different vendors. Many people are exploring this route right now, some of them have attached the name of polyglot persistence to this approach.

Why Do I Want a Federated Database?

There are a couple of reasons that you’d want to roll your own federated database. I touched on them in the first part of this series: you may want to query across databases, you might have legacy systems, you may have merged with another company, or you might be using the most appropriate database for the job. Whatever you’re doing, you probably have a number of databases and you need to stitch them together.

Where Do I Get Started?

There are a number of ways we could go about creating a federated database. A lot of the ways to build a federated database solution are incredibly complex and involve creating meta-data databases as well as devising ways to link the databases together in an easily query-able way. I’m going to propose something different. Instead of designing something on your own, use the technology you already have and that your programming platform already comes with and understands: TCP/IP.

Nearly every programming language is capable of talking to other programs over TCP/IP. Instead of creating custom databases and worrying about meta-data management and cross server querying, create common services that answer common questions. Break your monolithic application down into manageable services and write those services using the most appropriate technology. Over my career, I’ve found that very few users need the ability to run ad hoc reports over the entire corporate data set. If users don’t need to be able to interactively query the entirety of their corporate data set, what do they need?

Almost all users need a small set of reports and data. Even when we expand the definition of “users” to include applications, services, APIs, and protocols, most activities are incredibly limited. Our users are asking the same sets of questions: How many accounts receivable have aged more than 30 days? What do the sales figures for the New England region look like for the last three years? Even when users are adding data to our databases they’re still performing a limited set of actions like saving an entire order, signing up for a new account, or adding a new accordion to their shopping cart. The activities that users perform data are very limited.

Knowing that users only perform a few activities with our live data, we can safely make some assumptions about the type of data access people will need. Keeping that in mind, it’s a lot easier to see how we can build our own federated database: we’re not going to. We’re going to build our own system using what many people call polyglot persistence.

Designing for Polyglot Persistence

The idea behind polyglot persistence is that we keep our data in the best database for storing that particular kind of data. Achieving this goal is achievable, but if that were the end game, it wouldn’t be the most useful goal for the business – business users want to see reports and combine data across applications and business units.

Going one step beyond the basics of polyglot persistence, we want to add another layer – a caching/service layer. It’s in this layer that we can start to really add rich functionality to the data that the business needs. Instead of having to replicate data across multiple data sources, we can query two separate servers and combine the data together before we return it to the client.

We’ve been doing this for years – it’s nothing new. The only thing that is remotely new is storing our data in the most suitable database. Well, that and telling our caching/service layer to cache as much data as possible while writing in the background. If we keep most of our data in cache, we don’t have to worry as much about write performance in the back end. We can queue writes to make sure they commit during idle times, we can spread them across many servers, and we can write to many reporting databases at once to make sure that reports are up to date. By moving application and reporting logic into an application and reporting tier, we free the database to focus on the tasks that databases excel at: storing and retrieving data. Complex logic and strange data mucking can be handled in the application layer by simple (or highly specialized) algorithms.

Polyglot persistence becomes incredibly valuable when we build mechanisms to load data from all of our disparate line of business systems into a single enterprise data warehouse. Once we have all of our data in a single warehouse, we’re able to write queries across business boundaries. The enterprise warehouse doesn’t need to be in a single monolithic RDBMS server; it could use Microsoft SQL Server Parallel Data Warehouse, Oracle ExaData, Postges-XC, HBase, Cassandra, or any other database that is up to the task.

Wrapping it up

Polyglot persistence seems to be the best answer to building a federated database. It doesn’t provide any kind of automated meta-data management or support for distributing queries automatically across many servers. Instead, polyglot persistence makes it easier to build a robust system that answers the questions business users both want and need while remaining fast and flexible. Is it the be all end all solution? No. Is it a step in the right direction? Yes.

PostgreSQL Row Storage Fundamentals

I answered a question a few days ago on Stack Overflow about PostgreSQL & SQL Server btree storage fundamentals. After I answered the question, I started thinking more and more about how PostgreSQL stores data on the row. Rather than be content to live in ignorance, I went digging through PostgreSQL’s source code and some hex dumps to find out what was going on inside a row.

Getting Started

To get started, we’ll create a separate test database that we’ll cunningly call test.

createdb test --username=jeremiah
psql test -f pageinspect.sql

The next step is to set up our sample data. Here we create a small sample table with four rows:

CREATE TABLE tree (key int NOT NULL, id int NOT NULL);
ALTER TABLE tree ADD CONSTRAINT pk_tree PRIMARY KEY (key, id);
INSERT INTO TREE (Key, ID) VALUES (1, 1), (1, 2), (1, 3), (1, 4);

You can copy this and run it in psql or any other tool that lets you connect to PostgreSQL and run commands. When we run the ALTER TABLE to create the primary key PostgreSQL will go ahead and create an index for us: NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index "pk_tree" for table "tree". SQL Server will do something similar. The implementation is different, but we’ll get to that in a bit.

Read page items

In order to view data on the page we have two options. Option 1: shutdown PostgreSQL, open our copy of the source, fire up our trusty hex editor, and then hate life for a long time. Option 2: use the pageinspect contrib module to give you some extra functionality that will help us read page data.

heap_page_items will show us the data in the table. PostgreSQL stores table data in unordered heaps, hence the name heap_page_items. The heap_page_items function takes one input: a byte array representing the page. Thankfully, we can get this from using get_raw_page. This is what the first row of output will look like (magically formatted for your display):

SELECT * FROM heap_page_items(get_raw_page('tree', 0));
-[ RECORD 1 ]------
lp          | 1
lp_off      | 8160
lp_flags    | 1
lp_len      | 32
t_xmin      | 1019
t_xmax      | 0
t_field3    | 0
t_ctid      | (0,1)
t_infomask2 | 2
t_infomask  | 2048
t_hoff      | 24
t_bits      | (NULL)
t_oid       | (NULL)

That’s a lot of gibberish, but it’s going to help us find our row data when we look at the raw page in a hex editor.

Reading the Raw Page

While we’re on the subject, let’s look at the raw page:

SELECT  get_raw_page::text
FROM    get_raw_page('tree', 0);

This will look like pure gibberish (unless you like reading hex dumps_. I took the output and pasted it into a hex editor. It’s still gibberish, but at least we can hope to read this with the hex editor making life easier.

To find the first row that we inserted (1,1) we’re going to use the lp_off from heap_page_items to get the offset from the start of the database page This is where we are going to start staring blankly. In order to figure out just how much data we need to look at, the lp_len column helpfully tells us the length in bytes of the row (including header). To look at the first row that we inserted, we want to start at byte 8160 and look at the next 32 bytes; we’ll be reading the last 32 bytes in the page.

Looking at the Row

Now that we know where to look, we can look at the row structure of the row. What we’re really interested in is the data on the page, but we’re going to take a look at the row header first.

PostgreSQL’s row structure is covered in detail in the Database Page Layout documentation, but let’s have some fun and break down the technical jargon.

What's in a page - the first part

The first 4 bytes are a TransactionId: t_xmin. This is the first transaction where our row is visible. Since PostgreSQL uses MVCC for data management in transactions, t_xmin will refer to the transaction when this particular row was inserted. In my case, this row was inserted during transaction id 1019.

You might notice that a lot of these columns start with t_ That’s because they’re columns that refer to the current tuple. lp_ refers to a line pointer. This is the pointer to the location on the page.

The next 4 bytes are another TransactionId: t_xmax. Logically, if t_xmin is the first time a row was visible, t_xmax would refer to the last time a row is visible. This is the transaction ID that identifies when this row was deleted. One of the interesting things about PostgreSQL’s implementation is that there may be many copies of a row at the same time in the database. These rows will all have different t_xmin and t_xmax values. Over time PostgreSQL’s vacuum process will remove the deleted rows from the table.

After the t_xmin and t_xmax headers, we’ve got another 4 bytes that heap_page_bytes refers to as t_field3. Thankfully, we can take a look at the source code, which has far more information for us. The developers have provided a lot more detail to make it easier to understand what’s going on . t_field3 can actually contain two pieces of information: t_cid or t_xvac. This field will contain either a pointer to the current version row (it could even refer to itself) or else the transaction ID that deleted this row. In my case, this field is 0x00000000. Initially I was confused by this and couldn’t figure out what it meant. Then I started digging around in the code again (see how this helps?) and I found another gem of a comment referring to what is really going on. The t_cid is a composite field that can be used to backtrack to a min and max command but is typically used when inserting or deleting so we can backtrack to the original version of the row. The min and max command are only interesting within context of that transaction. The other information that can be stored in t_field3 is an Xvac identifier. This is an older feature that tracks the transaction id of the older VACUUM FULL command. t_field3 is blank in this case because there is not a transaction in progress.

The next field seems somewhat similar: t_ctid. This stores a pointer to the current version of this row. When a row is updated, PostgreSQL will write a new version of the row and mark the old version as deleted (a new version of the row is written and the original row’s t_xmax field will gets a value). To make minimal changes on disk PostgreSQL will write the new row, update the t_xmax of the original row, and write to the t_ctid field of the original. This has the effect of creating a short lived row pointer. Eventually this will be cleaned up by the VACUUM process, but it’s worth noting. The t_ctid is 6 bytes – 4 bytes to store the BlockIdData (the physical location of the page in the database file) and the OffsetNumber (a 1-based index to a row on a disk page). Combining these two gives us the coordinates of the row on a page. Pretty nifty, eh? Here, the t_ctid is the same as the current row because no updates have occurred.

What's in a page - the second part

Both t_infomask2 and t_infomask are flags and attributes. There is a lot of information that can be contained about NULLs, variable width columns, the current state of the row (is it in a transaction, the state of any transaction, how the row arrived at this location). This is documented very well in the source code. Normally I wouldn’t say that, but the source is fantastic and I’d just be repeating it here.

The last two items before the row data are t_hoff and t_bits. The t_hoff field is the header offset – this tells us where we can find the beginning of data inside the row. This is important because t_bits is a variable length bitmap of null values. The value stored in t_hoff is the size of the header including the size of the t_bits bitmap plus any associated padding that may be lurking about in the row.

We finally get to the data. There are two 4 byte integers. For this first row (id: 1, key: 1), both integers are stored as 0x00000001 (they’re both 1). If you look in the background of either picture, you’ll see how the values increase with each row.

What the Hell Did I Just Say?

The original question on Stack Overflow was asking “if data is a b-tree, how the devil is it stored on disk?” The poster was a bit confused and thought that composite keys would be stored as a tree and might look something like this (ASCII art is great):

Key:           1
                    |
         +----+-----+----+
         |    |     |    |
Value: 1    2     3    4

We just went and looked at the table structure in PostgreSQL and we know that’s not the case for the heap structure. Every value in every column is stored in the heap.

What About the Index?

Yes, what about that index?

If I had never created that primary key, we would have nothing to worry about – the table would be a heap and that’s it. Since I created the primary key (and PostgreSQL created a unique index to support that primary key), we have to worry about what the index looks like. Instead, we’re stuck with this index that we have to figure out.

Thankfully we have some functions that make it easy to see how the index is put together. We’re only going to look at bt_page_items. The other functions are interesting and let you see how the index page is put together but, frankly, we’re more worried about the actual index data itself.

SELECT * FROM bt_page_items('pk_tree', 1);

  itemoffset | ctid  | itemlen | nulls | vars |          data           
------------+-------+---------+-------+------+-------------------------
              1 | (0,1) |      16 | f     | f    | 01 00 00 00 01 00 00 00
              2 | (0,2) |      16 | f     | f    | 01 00 00 00 02 00 00 00
              3 | (0,3) |      16 | f     | f    | 01 00 00 00 03 00 00 00
              4 | (0,4) |      16 | f     | f    | 01 00 00 00 04 00 00 00

The itemoffset is similar to the lp_off that we used earlier – it tells us the offset of this particular b-tree record within the page. We can use a combination of itemoffset and itemlen to calculate the location of the row in the page. Pretty slick, eh?

The next column, ctid helps PostgreSQL identify the current version of the row in the heap to read from once all of the lookups are done. ctid is stored on the b-tree page as two 2 byte integers. The first integer is the page the row exists on and the second integer is the row’s offset on the page. The t_ctid field in the heap row’s header may point to a newer version of the row.

Whenever possible, PostgreSQL will perform a HOT update (Heap-Only Tuple update) and only update the heap record. Why? It’s a heck of a lot cheaper than updating rows inside indexes. The MVCC model doesn’t extend to indexes, so modifying data on a b-tree page is more expensive than modifying data in a heap.

There is also a bitmask that lets us know if there are any nulls or variable length columns in the b-tree (just like in the heap).

Finally, we come to the data. If you look back at the data in the heap pages and compare it to the data in the b-tree page, it’s exactly the same. PostgreSQL isn’t doing anything fancy at the leaf-level of the index to store lookup data in any kind of tree structure. Our data is being written in the same format we’d expect at the page level.

Multi-Level B-Trees

Things are a bit different when you look at a b-tree with multiple levels. WTF is a multi-level b-tree?

B-trees are a data structure that makes it easy to find the particular piece of data that you’re looking for. To make these lookups easier, b-trees are structured into levels. The top level is like the markings in a dictionary pointing to a letter: they make it easy to know roughly where word might be located. Then there is an intermediate level that’s like the guide at the top of the page telling you that this page contains ‘aardvark – anteater’.

PostgreSQL (as with most RDBMSes) uses a variant of the b-tree called a B+ tree. B+ trees are optimized for block operations and contain additional optimizations that improve performance. As a result, they’re wonderful for databases, filesystems, and anything else that has to find and read data in large chunks.

What’s the B+ Tree Look Like?

Enough about the theory, let’s see some bytes!

Since the first table wasn’t large enough, I created a second table with much more data:

CREATE TABLE tree2 (key int NOT NULL, id int NOT NULL);
ALTER TABLE tree2 ADD CONSTRAINT pk_tree2 PRIMARY KEY (key, id);

DO $$
BEGIN
  FOR i IN 1..10000 LOOP

    FOR j in 1..100 LOOP
      INSERT INTO public.tree2 VALUES (i,j);
    END LOOP;

  END LOOP;
END
$$ LANGUAGE 'plpgsql';

Once we have 1,000,000 rows in the table, we can take a look at the index meta-data to see what we’re dealing with:

SELECT * FROM bt_metap('pk_tree2');

 magic  | version | root | level | fastroot | fastlevel 
--------+---------+------+-------+----------+-----------
 340322 |       2 |  412 |     2 |      412 |         2

From the bt_metap function, we can tell that there are 2 levels in the table and the root of the tree is on page 412. Let’s look at what’s on this root page:

SELECT * FROM bt_page_items('pk_tree2', 412);

 itemoffset |   ctid   | itemlen | nulls | vars |          data           
------------+----------+---------+-------+------+-------------------------
          1 | (3,1)    |       8 | f     | f    | 
          2 | (411,1)  |      16 | f     | f    | 14 04 00 00 0b 00 00 00
          3 | (698,1)  |      16 | f     | f    | 27 08 00 00 15 00 00 00
          4 | (984,1)  |      16 | f     | f    | 3a 0c 00 00 1f 00 00 00
          5 | (1270,1) |      16 | f     | f    | 4d 10 00 00 29 00 00 00
          6 | (1556,1) |      16 | f     | f    | 60 14 00 00 33 00 00 00
          7 | (1842,1) |      16 | f     | f    | 73 18 00 00 3d 00 00 00
          8 | (2128,1) |      16 | f     | f    | 86 1c 00 00 47 00 00 00
          9 | (2414,1) |      16 | f     | f    | 99 20 00 00 51 00 00 00
         10 | (2700,1) |      16 | f     | f    | ac 24 00 00 5b 00 00 00

This is showing us how data is laid out throughout the index. We can look on pages 3, 411, 698, 984, 1270, 1556, 1842, 2128, 2414, and 2700 and see what data is on the intermediate (non-root, non-leaf) pages of the index. It’s similar to the data in the root of the index: it tells PostgreSQL how to find rows that fall within a certain range in the index. Whenever we finally drill down to the leaf level itself, we’ll see something that looks a lot like what we saw when we looked at the index pk_tree.

Summary

Long story short, PostgreSQL does not perform any shenanigans or trickery to compact the data on disk. If we have what seems like a hierarchical index on two columns (parent, child), all values will still be stored on disk. Data is stored in a heap structure that is supported by additional indexes that speed up lookups.

I’m Presenting at SQL Saturday 67

No, this isn’t a re-run! I’ll be presenting about Refacatoring SQL at SQL Saturday 67 in Chicago this coming Saturday.

I’m really excited about this opportunity. I had a blast presenting in Chicago last year and I’m looking forward to doing it again this year. There’s a great line up of speakers. If you’re in the Chicago area and want to get your learn on, I suggest you swing on by the DeVry Addison campus and check it out.

Here’s the title and abstract:

Refactoring SQL

Refactoring SQL is not like refactoring application code. This talk will demonstrate proven SQL refactoring techniques that will help you identify where performance gains can be made, apply quick fixes, improve readability, and help you quickly locate places to make sweeping performance improvements. Jeremiah Peschka has years of hands on experience tuning SQL applications for performance, throughput, and concurrency.

Why Don’t We Have Federated Databases?

Federated databases are a dream that have not materialized. The SQL/MED extension to the ANSI/ISO SQL specification is a step in the right direction. In addition, both SQL Server and Oracle have proprietary extensions that make it possible to query external data sources. If all of this technology is available today, why aren’t more people using it?

Why Don’t We Have a Federated Database?

If federated databases are such a powerful thing, why hasn’t anyone built one? Surely this is something that many businesses are clamoring for, or at least in need of.

There are a number of problems facing anyone attempting to implement a federated database. Frankly, federated databases are fraught with technical difficulties. Every database vendor supports a different subset of the ANSI/ISO SQL standard, different vendors use different data types and metadata, concurrency is a huge concern with dealing with remote resources, and technology is a moving target.

Different Dialects of SQL

Anyone who has attempted to port an application from one database engine to another knows about the pain involved in translating queries. Different vendors adhere to the SQL standard to varying degrees. This problem can be solved by creating wrappers to translate sub-queries between different querying languages, but it’s still a problem that exists. This problem could be partially solved in a federated database by limiting the database vendors to a small subset of the ANSI/ISO SQL standard, but this doesn’t solve the problem, it merely avoids it by limiting functionality.

Different Metadata

Different databases may have different metadata – different data types are used to represent the same data and different structures are used to describe data. To solve this problem it becomes necessary to create elaborate mappings between columns that represent the same data. There may be situations where such a mapping becomes computationally intensive or functionally impossible. SQL Server allows the creation of .NET data types with custom methods for data searching, access, and manipulation. PostgreSQL features several data types (notably tsvector, hstore, and arrays) that would be difficult to convert to data types in other databases.

Metadata differences don’t just stop at the data type level. It’s possible to model data in a number of different ways; the type of an address could be indicated using an integer key value that references a lookup table in one database or as a VARCHAR column with values constrained by the database. It’s even possible for something as simple as Unicode text encodings to cause problems: SQL Server uses the NVARCHAR data type for storing Unicode strings while other databases do not use a separate data type.

Concurrency

Concurrency, depending on your database, may be a concern. Managing concurrent operations within a single database is a difficult task, much less managing concurrency across multiple databases. Unfortunately, correct handling of concurrency across all components of a federated database is critical.

Many potential problems of a federated database can be solved through different trade offs. However, managing concurrency is a nearly impossible task. To properly and effectively manage concurrency across multiple databases is to ask the impossible. Not only would this require the federated database vendor be able to account for all possible concurrency issues in relational databases, but they would need to be able to account for potential concurrency issues in any database that integrates with the federated database.

The Moving Target

Even were a database vendor to take on this task, they’d be consistently aiming for a moving target. New features are added to relational databases all the time, and there are enough major players in the market to make it difficult for users to keep up to date, much less a federated database vendor. Once you factor in the wealth of other, non-relational databases, the idea of creating a federated database system to handle metadata mapping, concurrency control, and query language resolution trends toward impossible.

The State of The Industry

Where we stand now, there is almost no chance of any independent software vendor creating a true federated database. There is hope, but not from where you would expect it.

Enterprise data warehouses can fulfill much of the function of a federated database, but they still require complex ETL and data mapping to be truly useful. Adding additional information to an enterprise data warehouse can require extra work to prepare the data warehouse and ETL processes for the new data. Unfortunately, enterprise data warehouses require too much manual intervention to be a candidate for a federated database.

As we’ve discussed, SQL Server and Oracle provide ways to reference remote database servers. These methods have their own problems. SQL Server linked tables are prone to problems with some objects not being remoteable. When we’re querying a remote server, we need to make sure that the parts of our query going to the remote server are handled on the remote server. This is difficult to get right. On some occasions we might even see an entire remote table be streamed across the wire to be filtered on the originating server. This is something that we don’t want to see happening. For a federated database to be a tenable product there must be an easy way to offload queries to a remote table and a guarantee of adequate performance.

Properly remoting queries is incredibly complex. Assume, for a moment, that we have a report that queries data on the sales department’s database server and we also need to include data from human resources data. Our query might look something like this:

SELECT sp.first_name,
       sp.last_name,
       eh.employment_duration,
       ts.year,
       ts.total_sales_by_year,
       r.average_review_score
FROM   public.sales_person sp
       JOIN HRDB01.employee_info.public.employee_history eh
            ON sp.employee_id = eh.employee_id
       JOIN ( SELECT o.employee_id,
                     o.year,
                     SUM(o.total) total_sales_by_year                     
              FROM   public.orders AS o
              GROUP BY o.employee_id, o.year
            ) AS ts ON sp.employee_id = ts.employee_id
       JOIN ( SELECT r.year,
                     r.employee_id,
                     AVG(r.score) AS average_review_score
              FROM   HRDB01.employee_info.reviews.review r
              GROUP BY r.year, r.employee_id
       ) AS rv ON sp.employee_id = rv.employee_id
                  AND ts.year = rv.year;

Looking at this query we’re hitting two separate remote objects in one remote database. In order for this query to be effective, our query optimizer must b able to re-write the query in a way that lets it build an intelligent query for the two remote tables HRDB01.employee_info.public.employee_history and HRDB01.employee_info.reviews.review. But, in order to effectively re-write the local query to properly reference remote objects, we need to know everything about the remote objects – our calling server must be aware of as much metadata as possible so the remoted query can be re-written before being sent to the remote server. While this is doable, it puts additional load on the calling server. This server now has to maintain information about remote database objects. But that’s not all! If we want our queries to be truly optimal, our federating sever will need to be aware of how data types will behave on the remote servers and how the remote data types will interact with local data types.

Once you examine the intricacies of a federated database, it becomes obvious why the federated database, as a boxed product, is beyond our reach. It’s not that the task is impossible; on the contrary such a task is very possible. The difficulty lies in coordinating all of the information available and using it to deliver data quickly. There are enough moving and potentially unknown parts that it’s non-trivial to create heterogeneous systems capable of filling out the promise of federated databases. Faced with this situation, the only viable solution is to build your own solution that answers the needs of the business.

http://railstips.org/blog/archives/2011/01/27/data-modeling-in-performant-systems/

The Promise and Failure of Federated Data

One of the biggest problems facing businesses today is integrating data from multiple sources. The holy grail of data integration is called a federated database system. Basically, a federated database stores meta data about other databases and makes it easier to integrate them through a single interface. Many relational databases have features that support integrating with other relational databases through SQL Server’s linked servers or Oracle’s database links. One of the problems with these features is that they only allow relational databases to talk to other relational databases. As the volumes of data we collect every day increase, more and more of that data is being stored outside of relational databases in CSVs, spreadsheets, log files, PDFs, and plain text as well as in a variety of non-relational databases like MongoDB, HBase, Riak, and Cassandra.

If the only thing we’re looking for is access to meta data for external data, SQL Server will provide the [FileTable][8] data type in SQL Server 2011. Admittedly, FileTable isn’t an acceptable solution because it’s really intended to make it possible to reference files in the database that are being managed by external applications through the filesystem and Win32 APIs. Clearly, this doesn’t suit our need for querying external data.

It’s also possible to use ETL tools to move data into relational databases. One of the problems with ETL tools (SSIS, Pentaho Data Integration/Kettle, or Oracle Data Integrator) is that they are effectively batch operations. New data insertions will have to be triggered by some external event; the data isn’t available until it’s migrated into some master system.

Luckily, there’s an extension to the SQL Standard to help us: SQL/MED. MED stands for Management of External Data. This is a way to link up any external data source to a database server. It doesn’t have to be another relational database – there’s already a twitter foreign data wrapper library. Unfortunately, PostgreSQL is currently the only major database player on the market with any potential for an implementation for SQL/MED.

The implementation of SQL/MED just isn’t here, yet.

Another promising project is HadoopDB. HadoopDB is a project coming out of Yale University. The aim is to make it possible to run analytical workloads in parallel across many commodity RDBMS servers. One of the goals of HadoopDB is to excel in areas where parallel data warehouses simple do not perform well. Many of these situations are outline in the paper HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads. To summarize, parallel data warehouses provide near linear scaling up to several hundred nodes running on homogeneous hardware. Parallel data warehouse also operate under the assumption that failures are rare. Google and others have demonstrated that hardware failure is inevitable at scale. HadoopDB presents a phenomenal way to scale databases and integrate disparate technologies.

Despite its promise of scaling databases, HadoopDB still doesn’t solve the problems that we face when trying to build a federated database system. The truth is a depressing one – there is currently no solution for building federated databases that incorporate data from across the enterprise. Database vendors, DBAs, and more traditional corporate IT departments will tell you that this is a Good Thing™. I’ve mentioned before that you should choose the database that is best suited for the task at hand.

Where do we go from here? SQL/MED doesn’t meet its own promises – only one vendor is implementing the SQL/MED standard and that support is going to depend on third parties releasing drivers. HadoopDB isn’t a federated database so much as it is a way to avoid scaling a relational database into thousands of cheap nodes and paying millions of dollars in licensing fees for Teradata, Microsoft’s Parallel Data Warehouse, or Oracle’s Exadata. The unfortunate truth is that if we want a federated database we’re going to have to build it ourselves.

What sounds like a Sisyphean task isn’t as difficult as it sounds. If we’re collecting data in multiple databases, the problem is already almost solved. Some of those technologies are already here. LINQ lets us treat all data sources equally; we can query an array of objects as easily as we can query a database. ARel is a relational algebra for Ruby. While ARel is specifically focused on working with relational databases, it could be extended to work with many different data sources. Business intelligence vendor Jaspersoft recently announced support for a number of non-relational databases to complement their existing business intelligence products. Quest Software, makes Toad for Cloud Databases – a tool for querying both relational and non-relational databases.

A federated database may never materialize, but it’s already possible to build a hybrid database solution today.

References

PostgreSQL Extensions

PostgreSQL Extensions

Dimitri Fontaine has been working on adding extension packaging functionality to PostgreSQL for a while. His hard work has paid off and PostgreSQL 9.1 will have extensions.

How will extensions work? At a cursory glance, extensions are similar to libraries or DLLs in the programming world. Extensions are packaged units of code that can be installed using the CREATE EXTENSION command. This command will execute a script file that installs any of the extensions objects, including any C code that needs to be built or loaded. If this was the only thing that extensions provided, it would be a nice feature. Interestingly, there’s more.

PostgreSQL’s extensions may consist of any number of database level objects – tables, functions, views, etc. These objects are versioned together as part of a discrete package. While PostgreSQL won’t let you drop any database object that’s part of an extension, you could go ahead and break an extension using CREATE OR REPLACE FUNCTION. Those changes won’t be tracked as part of the extension.

Once our extension has been installed, it’s going to be managed as a versioned chunk. The best way to make changes it to upgrade the extension using code supplied by the extension developer. The extension code lets developers supply a version number, as well as some other metadata. To be fair, the extension code in PostgreSQL requires that extension developers supply a version number; extensions without a version number will fail on creation with some kind of helpful error message about an invalid version name. But, once we’ve created an extension with a version number we get a few niceties.

  1. We can tell which version of an extension in installed in every database on our server by looking in the pg_extension table of each database.
  2. We can perform updates by comparing the current version to update scripts. Admittedly, ALTER EXTENSION looks like it should handle this for us, but it’s still cool.

What else do we get from the new extensions?

Configuration! Extensions have to ways of setting up their configuration: control files and configuration tables. Extension developers can supply limited configuration through a control file. The control file lets an extension developer list things like the name, a description, and any other extensions that are needed for proper functionality.

In addition to the control file, extension specific configuration information through configuration tables. Not only can we supply additional functionality, but it’s possible to make our functionality flexible. Like all important data, it’s important to back up the configuration. To make sure that users’ configuration information is backed up, extension developers can use the pg_extension_config_dump function in their extension installation code to mark extension tables that should be a part of back ups; PostgreSQL will ignore extension tables when pg_dump is run. Interestingly, pg_extension_config_dump takes a second parameter that will be used as a search predicate when pg_dump is run. This can be used to only backup user supplied parameters or work through whatever convoluted scheme you’d like (parameters are only backed up on bank holidays).

We also get extra control about where extensions live. Have you ever installed software from a vendor and been a bit annoyed about where it ends up in your database? Both the CREATE EXTENSION and ALTER EXTENSION commands make it easy to relocate an extension to a new schema. DBAs can opt to install extensions into one schema and move them into another at a later date. We could install a vendor library and move it to a safe schema at another time, for example ALTER EXTENSION magic_vendor_security_audit_tools SET SCHEMA vendor_audit;. ALTER EXTENSION will even update the definition of tables that already exist.

I suspect that the extension mechanism in PostgreSQL 9.1 is going to have far reaching effects for PostgreSQL. The PostgreSQL development team are already moving procedural languages into extensions and bundling additional functionality (formerly living in the contrib folder) into extensions. This addition to PostgreSQL should also make it much easier for software vendors to supply additional functionality for PostgreSQL.

Writing Tips

Past Articles

I’ve written a few times about writing – The Act of Reading and The Act of Writing. I thought that it would be good to put together a grab bag of some of my favorite writing tips and tricks to make life just a little bit easier for people who want to get better at writing.

On the Act of Reading

In The Act of Reading I talk about how it’s important to actively read as part and parcel of getting better as an author. Reading is important because it spreads ideas. Nothing exists in a vacuum, after all.

Read the People You Admire

Have a favorite author that you admire? Grab your favorite work and read it again. This time around, read it slowly. Savor every word. Keep a pen, note pad, dictionary and thesaurus nearby. When you hit a particularly powerful passage, make a note of the page.

Once you’ve finished a chapter, go back and take a look at the passages you highlighted. What makes them special to you? Is it the word choice? The flow? Does the author build up interleaving sentences that leave you gasping for air as the hero runs for cover?

Figure out what makes that writing work for you. It’s not going to make you hate your favorite authors. If anything, you’ll appreciate their writing even more.

Read the People You Hate

Once. Just read it through once. But figure out why you hate it.

I read most of Cherie Priest’s Boneshaker before I put it down. I stopped reading the book because I couldn’t stand it. I’m sure the book was enjoyable for someone, but it didn’t work for me. The instant I realized the book didn’t resonate with me, I started asking myself questions to figure out what I didn’t like. It was important to know why I didn’t like the book; by understanding what I didn’t like, I can avoid the same pitfalls in the future. It made me happy to understand why I didn’t like the book. Once I understood my opinion I wasn’t pissed off that I had spent time reading the book. I was relieved. I learned something about what I want out of myself as a reader and as a writer.

Writing is a complex thing – there are many different pieces that make writing good or bad. Understanding what makes one piece of writing bad is just as important as understanding what makes another piece of writing good. Understanding what you don’t like about other writers will help you strengthen your own voice as a writer. Looking at it another way, if you know why something annoys you then can avoid similar problems in the future.

Read the People who Sell

As much as we might had to admit it, writing is a business. While creating art for the sake of art is a lofty goal, we all need to eat. There are many starving artists, but a few of them manage to make a good living. Read the best sellers. While you probably aren’t planning to write the great American novel or a New York Times bestseller, it doesn’t hurt to figure out what makes something a hit. Even if you don’t like popular writing, there is a lot to be learned from it: it sells, figure out why.

Read Reviews

The final suggestion I have is to read book reviews. This will be especially helpful if you’ve never tried to pull a book apart and see how it works. Book reviews can be daunting, or outright boring, if you’ve never read one before. Stick with it, you’ll learn a lot about how people read, write about, and deconstruct books. You’ll also get a glimpse at what critics like about authors – this can be a short cut to understanding what you love, hate, and what sells.

People I Read

Not that it matters, but these are some of the people I read:

Every time I read something that one of these people write, I read it to enjoy it. Then I read it again at some later time to figure out how it works.

On the Act of Writing

Writing is a daunting task. From time to time I’ll joke that I got an English degree because I just wanted to talk about things. I’ll follow it up by saying I got a non-fiction writing degree because I wanted to write about things that were already sitting around. The truth is that I knew I could figure most things out, but my writing was something that I valued and wanted to improve.

Here’s two quick tips: don’t edit as you go and don’t format as you go.

Don’t Edit as you Write

Don’t edit yourself while you’re writing. Just don’t do it. Instead of trying to get one sentence perfect, try to get the ideas out of your head as fast as you can. Editing while you’re trying to write can be distracting; you may lose your train of thought and spend most of your time focusing on a few words. In short: don’t miss the forest because you’re trying to describe the moss on a tree.

When I’m working on a blog post, especially when a longer one, I’ll make notes as I write so I can go back. Instead of editing a sentence, I’ll add a comment like “[NOTE: Don't like the sound of this, revise later, wishy washy]“. Once I’ve finished writing my first draft, I go back and look through my notes. I use these as the basis for my revisions. Sometimes I know how I want to say something but I don’t know exactly how to say it. I’ll make a note telling me to revisit a particular sentence or paragraph.

When I write, I focus on writing. When I polish my writing, I polish my writing. Each one requires different skills and different ways of thinking. I like to keep the activities separate.

Don’t Format as you Write

When I write, I only want to write. I don’t want to fiddle around with different fonts, indentation, line spacing or anything else like that. When I’m writing formatting is just a distraction: it doesn’t help my writing.

In college, and even after college, I would spend time getting my document “right” before I started writing. Word processors are a great way to procrastinate writing. It’s so easy to spend hours figuring out the optimum line spacing, tab levels, bullet formatting, and font choice to make sure that your document displays perfectly.

Your perfect document template doesn’t matter if you don’t have anything to fill it. When I write I use a plain text editor to create content. Admittedly, I use a pretty powerful text editor called Emacs and I write everything using the Markdown format. Markdown is a simple format, it converts easily to HTML, and it gets out of the way when I write. Emacs isn’t for everyone, it’s grumpy and ugly and has a steep learning curve. There are plenty of other tools out there. On the Mac there’s TextMate as well as Ulysses (that one is more for novelists). There are hundreds of editors out there that do a great job of doing nothing more than editing text. Some of them even have lovely full screen modes where they block out every other application. This is particularly good if you know you’re easily distracted.

The most important thing is to choose a tool that doesn’t distract you. Turn off spell check and grammar check while you’re writing. You can always turn it on later and fix your work. You’ll be much happier without all of those squiggles reminding you that you don’t know how to spell or that you’re ending a sentence with a preposition.

Getting Feedback

Writing shouldn’t happen in a vacuum. Writing evolves in response to feedback and experience. As you get more feedback you get better at writing.

One of the easiest ways to get feedback is to ask your friends what they think of your newest work. This can be a mixed bag, unfortunately. Your friends may not be writers, they may not be interested, or they may know nothing about the rules of grammar. On the bright side, your friends will be happy to help with your new found interest in becoming a better writer.

Writers have banded together in groups to discuss their work. Historically, there have been groups like the Inklings. You can find groups like Hugo House in every city where people have an interest in writing. While writing classes are helpful, some of the best feedback I received in college came from my fellow students. We had to ceaselessly review our peers, sometimes reviewing a paper on Monday only to read it again, with our revisions, on Friday. Watching something evolve in response to your criticism while re-shaping your own work in response to criticism is an enlightening process. Over time you’ll come to recognize your own colloquialisms, hang ups, and other linguistic oddities. You’ll learn what works and what doesn’t. Writing with other writers is one of the best ways to get better.

Proofreading

Proofreading is one of the nastiest jobs I’ve encountered while writing, especially when I’m proofreading my own writing. I’ve come up with a few tricks over the years to keep myself engaged in the task instead of rushing through it haphazardly.

Small Chunks

Work in small chunks of time. Your brain can only focus for a short period of time on something you don’t enjoy. If you’re anything like me, proofreading is something that you definitely do not enjoy. Instead of toughing it out, break your work up. If you can only focus on proofreading for 30 minutes, make sure you break up your work so that you can work on proofreading for 30 minutes and then have another task lined up. It seems simple and obvious, but I’ve tried to drive through the wall of boredom while proofreading. The results were disastrous.

Read to Yourself

Read your own writing to yourself. Don’t silently read to yourself. Print out a copy of your writing, stand up, and read it out loud. You’re going to feel silly at first. Don’t stop. Don’t talk in a quiet voice. Read in a firm, strong voice.

While you’re reading, keep a pen in one hand. Circle any part of your writing where you falter or stumble while you’re reading it. Anything that sounds weird or awkward gets circled too. Your writing should sound like you. You don’t want your readers pausing awkwardly while they read. Anywhere you stumble, they’re likely to stumble.

One final tip about reading to yourself: once I’ve gone through a particular article reading it out loud, I’ll run through specific parts again but I’ll read it in a silly voice. I’ve frequently caught awkward and just plain bad writing using this method. You’ll feel weird doing it, but it really works.

Three Proofreading Tricks

Some people like to proofread in their word processor. While the word processor is great for catching some basic spelling and grammatical mistakes, it isn’t the best place to do proofreading. I use three tricks when I’m proofreading to help me spot my spelling, grammatical, and stylistic mistakes.

The first thing is to print out a copy of your writing. I prefer to proofread by marking up a physical copy of whatever I’m working on. I’ll print out a nicely double spaced copy of a blog post and go to town on it with a red pen. this makes it very easy to cross out words, add words, and amend my revisions without losing any of the original copy and without having to refer to the margins to see what’s changed since I started.

After a while, you’re going to get bored proofreading. It’s not that exciting. If you do like proofreading, skip ahead. For the rest of you, here’s two tricks to keep you going. When you’re proofreading, take that print out and turn it upside down. Shockingly, you will have no problems reading upside down but you’ll have to work at it: by turning the paper upside down you force your brain to focus on reading. Since you have to focus more to read you focus more on the words. As a result you’ll notice more mistakes and awkwardness. When you get sick of reading upside down you can head to the bathroom and read in the mirror instead of turning the paper upside down.

Be Yourself

No matter what I say, no matter what anyone else says, the most important part of writing is to be yourself. Experiment with a voice, a style, and a tone that’s all your own. Play around with different writing techniques until you find something that works well for you. I listen to The Orb’s Adventures in the Underworld while I write. You may not want music on at all.

Above all else, have fun. Your writing should show your passion. Words aren’t some collection of dead sticks that you arrange to communicate basic facts; words communicate your passion and expertise.

Copy, Paste, Cloud

One of my favorite features of EC2 is the ability to create virtual machine templates and re-use them to create fresh copies of a virtual machine. This is great but things rapidly get onerous when you’re trying to duplicate infrastructure.

Amazon recently unveiled a new service called AWS CloudFormation. There are currently many Amazon cloud offerings available: S3, Elastic Block Storage, EC2, and Elastic Beanstalk are just a few. AWS CloudFormation is more than just another member of the family: it ties them all together.

The idea behind AWS CloudFormation is to make it easy to create a collection of AWS resources and then deploy them the same way every time. AWS CloudFormation is similar to using Chef recipes to deploy software configuration. In this case we’re deploying an entire infrastructure stack to multiple virtual machines via a recipe. We can design our infrastructure on AWS. As our business grows we will be able to quickly and easily duplicate crucial parts of our infrastructure.

AWS CloudFormation makes it simpler to manage all of your infrastructure. Deployments of new infrastructure become a matter of pushing out a template. If there are problems with a deployment, the changes can be rolled back and a clean up happens to make sure you aren’t charged for anything that you’re not using.

Deployments

In a traditional IT department, there is a design, purchase, deploy cycle that can potentially take a very long time. In previous jobs, we’ve had to design the infrastructure based on obscure internal capacity planning metrics. Once we’d made predictions/guesses about our future growth, we would then wait for weeks or even months to acquire new hardware. Once we had the hardware, it might even sit around for days or weeks before we were finally able to provision, configure, and deploy the servers on the network. That doesn’t even include deploying our own software on the server.

On the flip side of that coin, by combining AWS CloudFormation plus Chef/Puppet, we can now push a new batch of servers out into the cloud in a matter of minutes and have them running in a few hours. Our software can be automatically installed and configured with Chef or Puppet. While we still need to write templates, once we’ve created and tested our templates for specific purposes (blog, database, community site, whatever), we’re able to fully automate deployments.

Scaling Out

AWS CloudFormation can also ease the pain of scaling out our applications.

Typically when we scale out an application, we’re starting from a monolithic application stack. All of the assets in our stack have been scaled up to a point where it’s cost prohibitive to keep scaling. At this point, we’d examine each layer of our application and determine the best place to add caching or scale out to use multiple application or database servers. As we keep scaling our stack, we need to add more load balancers, caching servers, and database read slaves until we’ve exhausted our options and have to revisit our application design.

Rather than engaging in the exercise of attempting to scale all of our customers at once, why don’t we start out by sharding all application resources at the customer level? While this increases the overall cost of operating our business, it makes it easy to scale elastically in response to a changing customer base. The busiest customers will get larger servers and increased performance that meets their needs. It also becomes possible to locate our data close to your customer in one of several Amazon zones.

For businesses offering software as a service, this makes a great deal of sense. They get an easy way to monitor usage per customer and can scale appropriately within known guidelines and with well known costs.

Wrap Up

AWS CloudFormation makes it possible to provision and deploy infrastructure using a set of templates. When you combine CloudFormation with Chef or Puppet, it becomes very easy to deploy infrastructure and then deploy additional configuration changes on top of the infrastructure. Ultimately, AWS CloudFormation makes it easy to quickly and easily deploy new infrastructure in response to changes in load or customer demand.

If you’re interested in some of the discussion around AWS CloudFormation, be sure to check out the Hacker News thread on the subject.

This site is protected with Urban Giraffe's plugin 'HTML Purified' and Edward Z. Yang's Powered by HTML Purifier. 531 items have been purified.