Angelos Petropoulos' WebLog

Thoughts occasionally worth reading

Using XML as a datatype in SQL Server 2005

clock November 28, 2007 22:14 by author angelosp

There are many articles and posts out there that describe the XML capabilities of SQL Server 2005. Some of the top links are

The Fundamentals of the SQL Server 2005 XML Datatype
XML Support in Microsoft SQL Server 2005
Introduction to XQuery in SQL Server 2005
Denis Ruckebusch's blog

The only problem is that you need to work rather hard to put all of this information together before you can get a decent working example. I went through this excercise myself, so I thought it would be worth posting it here. The example is simple, I want to store an XML in one of my columns and I want the XML to be typed (i.e. to adhere to a schema). The first step is to define the schema for the XML file and to import it into SQL Server 2005.

The schema defines the following structure

Order
    OrderItem
        Name
        Price
        Quantity 

Importing the schema into SQL server is pretty simple, just use following T-SQL statement

CREATE XML SCHEMA COLLECTION [dbo].[OrderCollection] AS
'<?xml version="1.0" encoding="utf-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="Order">
    <xsd:complexType>
      <xsd:complexContent mixed="false">
        <xsd:restriction base="xsd:anyType">
          <xsd:sequence>
            <xsd:element minOccurs="0" maxOccurs="unbounded" name="OrderItem">
              <xsd:complexType>
                <xsd:complexContent mixed="false">
                  <xsd:restriction base="xsd:anyType">
                    <xsd:sequence minOccurs="0">
                      <xsd:element name="Name">
                        <xsd:complexType>
                          <xsd:simpleContent>
                            <xsd:extension base="xsd:string" />
                          </xsd:simpleContent>
                        </xsd:complexType>
                      </xsd:element>
                      <xsd:element name="Price">
                        <xsd:complexType>
                          <xsd:simpleContent>
                            <xsd:extension base="xsd:decimal" />
                          </xsd:simpleContent>
                        </xsd:complexType>
                      </xsd:element>
                      <xsd:element name="Quantity">
                        <xsd:complexType>
                          <xsd:simpleContent>
                            <xsd:extension base="xsd:integer" />
                          </xsd:simpleContent>
                        </xsd:complexType>
                      </xsd:element>
                    </xsd:sequence>
                  </xsd:restriction>
                </xsd:complexContent>
              </xsd:complexType>
            </xsd:element>
          </xsd:sequence>
        </xsd:restriction>
      </xsd:complexContent>
    </xsd:complexType>
  </xsd:element>
</xsd:schema>'


Using SQL Server Management Studio you can confirm that the XML Schema is now available to be used by navigating to the "Types" node as shown below 


It is now possible to create a table and define one of its columns to be of the datatype XML and to ensure that the XML inserted into the column is validated against the schema we just added using the following T-SQL statement

CREATE TABLE [dbo].[Order](
    [ID] [bigint] IDENTITY(1,1) NOT NULL,
    [CustomerID] [bigint] NOT NULL,
    [Value] [xml](DOCUMENT [dbo].[OrderCollection]) NOT NULL
)


Inserting a record into the new table is still exactly the same as before, with the only difference now that one of the parameters will be an XML document

INSERT INTO [Order]
(
    [CustomerID],
    [Value]
)
VALUES
(
    1625,
    '<Order>
      <OrderItem>
        <Name>Chair</Name>
        <Price>12.99</Price>
        <Quantity>4</Quantity>
      </OrderItem>
      <OrderItem>
        <Name>Table</Name>
        <Price>44.99</Price>
        <Quantity>1</Quantity>
      </OrderItem>
    </Order>'

)


Now comes the interesting part. Of course you can now select this record within your application code and manipulate the XML using the standard libraries, but what if you want to access this information within SQL Server (perhaps through an ordinary stored procedure). Another possibility is that you would like to perform some reporting on the data that is stored within the XML document.

The following T-SQL will turn the XML file into tabular format, with columns for each element that contains a value (i.e. the elements  "Name", "Price" and "Quantity") and a new row for each instance of the element "OrderItem".

-- Declare a variable of type XML
DECLARE @xml XML
 
-- Assign the value of the XML document
-- we want to manipulate to the variable @xml
SELECT
    @xml = [Value]
FROM
    [Order]
WHERE
    ID = 1
 
-- Present the data in tabular format
SELECT
    -- Select the element "Name" and convert its value to a VARCHAR(100)
    OrderItemFragment.Details.query('Name').value('.', 'VARCHAR(100)') AS [Name],
    -- Select the element "Price" and conver its value to DECIMAL(28,2)
    OrderItemFragment.Details.query('Price').value('.', 'DECIMAL(28,2)') AS [Price],
    -- Select the element "Quantity" and convert its value to INTEGER
    OrderItemFragment.Details.query('Quantity').value('.', 'INTEGER') AS [Quantity]
FROM
    -- Split the XML document into individual pieces, one for each
    -- instance of the element "OrderItemFragment". Name the tabular result
    -- "OrderItem" and give its single column the name "Details"
    @xml.nodes('/Order/OrderItem') AS OrderItemFragment([Details])


The code is not really that complicated, but there are some rules.

  1. The function ".nodes()" used to split the XML file into fragments cannot be performed on value of the datatype XML. The reason I am selecting the XML file that I want to manipulate into a local variable first is to work around this limitation. I tried long and hard, but I could not get "Order.Value.nodes()" to work. (see "Update 29/11/2007")
  2. The output of the function ".nodes()" is tabular and has to be named and its columns have to be specified.
  3. The output of the function ".nodes()" can only be accessed through SQL XML functions such as ".value()", ".query()", ".count()", etc.


The line "@xml.nodes('/Order/OrderItem')" navigates through the XML document by first finding the element "Order" and then the element "OrderItem". Every time it comes across this relationship of elements (i.e. "Order" is parent and "OrderItem" is child) it seperate it from the rest of the document and returns an XML fragment that is the contents of the "OrderItem" element it has identified. The result of the XQuery statement is the collection of these XML fragments.

The line "OrderItemFragment.Details.query('Name').value('.', 'VARCHAR(100)') AS [Name]" uses the ".query()" function to find the element called "Name". It then uses the function ".value()" to retrieve the identified element's value and convert it to a SQL datatype.

The result are exactly what you would expect a T-SQL statement to return 

Update 29/11/2007: There had to be a way and sure enough I found it. You can use the ".nodes()" function on an XML columns straight in the "FROM" part of the statement rather than having to user a variable. The previous SQL statement is equivalent to this one

SELECT
    OrderItemFragment.Details.query('Name').value('.', 'VARCHAR(100)') AS [Name],
    OrderItemFragment.Details.query('Price').value('.', 'DECIMAL(28,2)') AS [Price],
    OrderItemFragment.Details.query('Quantity').value('.', 'INTEGER') AS [Quantity]
FROM
    [Order] CROSS APPLY [Value].nodes('/Order/OrderItem') AS OrderItemFragment([Details])
WHERE
    ID = 1

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


Persisting .NET DateTime

clock November 19, 2007 20:08 by author angelosp

In a previous post I pointed out how .NET's DateTime and SQL Server's DateTime have different percisions and comparing two instances of what was originally the same value could result in a mismatch. I also provided a solution that uses a BIGINT field in SQL Server to store the value as ticks and a user defined function to convert them to SQL's DateTime and back.

The problem with that solution is the extra processing that is required for performing the simple task of saving and loading a date and time value. This becomes an issue when dealing with a large amount of data, such as in the case of rendering of a historical report for example. There is an alternative that eleviates the need for any extra processing when saving and loading the value, but its a compromise.

The difference in percission is minor (only at the millisecond level). Looking at the situation pragmatically, in most scenarios it doesn't really matter. We rarely deal with time down to the level of milliseconds, typically seconds are more than enough. If the milliseconds are ommited both in .NET and in SQL Server, then there is no discrepancies between the values. This means that there is some data loss as milliseconds are removed from the date and time, but if they were not needed in the first place then this is not necessarily a problem.

The following method can be used to strip the milliseconds off a .NET DateTime

public static DateTime PrepareForSql(DateTime dateTime)
{
    return new DateTime((long)
        Math.Floor((decimal)dateTime.Ticks / 10000000) * 10000000);
}


Update:  To those that know me it won't be a surprise, but I'm dumb. I thought I was being clever by using .Ticks and Math.Floor, when just doing the following has exactly the same effect and it is more performant

public static DateTime PrepareForSql(DateTime dateTime)
{
    return new DateTime(dateTime.Year, dateTime.Month, dateTime.Day,
        dateTime.Hour, dateTime.Minute, dateTime.Second);
}


Run #10
  TestNewDate          00:00:00.1702448
  TestNewDateUsingTicks 00:00:00.2002880 

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


The VS2008 effect - MSDN funky error message

clock November 19, 2007 16:55 by author angelosp

A co-worker came across this error today

 

Looks like someone went live with an update to the source code and forgot to deploy some of the referenced DLLs.

BTW remote debugging is enabled ... uncool MS, uncool! 

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


SQL Server 2008 - Intellisense finally included

clock November 19, 2007 10:26 by author angelosp

All I can say is, finally!

I've used 3rd party tools in the past to get intellisense support within Query Analyser and  SQL Server Management Studio, but to be honest they don't quite work. Sometimes the auto-complete list does not show up, other times it does things you didn't want it to do such as insert the first item from the list because you pressed tab even though all you meant to do was indent the text, etc.

With all these different naming convensions enforced on developers, typically authored by people that never actually write SQL and their goal is to define every aspect of a database object within its name (e.g. transTblCustomer.bigintCustomerID kill me now), this is more than just a nice to have.

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


TechEd Wisdom – Still not convinced about LINQ

clock November 16, 2007 08:02 by author angelosp

First of all, let’s get something straight. LINQ is not SQL. LINQ is a query framework integrated to the language. What it queries is irrelevant and could be pretty much anything (which is why you have all these specific implementations such as LINQ to SQL, LINQ to XML, LINQ to SharePoint, etc). The only thing that LINQ and SQL have in common is similarity in syntax.

With that out of the way, let’s talk “advantages” (I also did the air-quote thing as I typed this).

Unified way of accessing different types of data

*cough*bullshit*cough*. We have been drinking this cool aid for years now. Wasn’t this the tag line for ADO.NET as well when it was first introduced? The reality is that you build an application and you either connect to a database or use XML files. You will not switch between the two because someone raised a change request.

I don’t care that I can use LINQ to query both XML and relational data. I am happy to use SQL for one and XPath for the other. Just as I am happy to write my own code to query proprietary text files and third party libraries to query PDF files. I just don’t care because in the real world it adds no value. If I have been using LINQ to SQL for 3 years it does not mean I also have 3 years of experience in LINQ to XML. I still don’t have a clue what the best practices and related architectural decisions are around using XML.

Look at it pragmatically. When was the last time you wanted to write a SQL-like statement to access your collection of objects or XML file? Personally I can’t even remember, but if the occasion does arise, I will write a method that does it and re-use it; big deal. You see these examples on the internet where they create a class scope array with random data and then use LINQ to find the item that has a length greater than 3 and then order the results alphabetically. Are you kidding me? God, you are not kidding. OK, a couple of questions

  1. In a production system, where is the data held? The answer 99.9% of the time is a database.
  2. In a production system, would you load random data from the database, get it through all the tiers and then do your calculations/filtering/etc. in the front end tier? If you answered yes, I don’t want to ever use your code or have you in my dev team; you are terrible.


Disclaimer: There will always be that one in a thousand situations where something unorthodox is required to make a very specific solution work. I don’t care. We are discussing a feature that has been embedded to the programming language itself. This is not about the exceptions; this is about the general rule.

Support within the IDE with intellisense and compile time checks

Well last time I checked, all code does. Why on earth is this considered an advantage? Its an advantage over using something terrible like scripting, sure, but this is effectively lowering the standards so you can brag about something that in reality is a given. Who would be happy using something new that does not provide you with intellisense, does not type check and only has run-time error checking? Even the JavaScript developers won’t have to put up with that when using VS 2008.

You don’t have to write SQL!

Clearly this is only applicable to LINQ to SQL and not LINQ in general. I’m addressing it because it makes up the biggest *wow* factor in the eyes of most developers who can’t wait to get their hands on LINQ.

So, what do you gain by using LINQ to SQL?

  1. You don’t have to write T-SQL (yes T-SQL, because LINQ to SQL only works with SQL Server) and you save time and effort!
  2. You have intellisense and compile time checking!
  3. Umm…that’s about it


Let’s look at real world applications and enterprise scale applications that use a database back end. Does the database expose its tables and then the application executes dynamic T-SQL at run-time? If you answered yes, shoot yourself now and preferably in the head.  Do it, I’ll wait. The database is locked down as much as possible. You use stored procedures as a façade between the data access layer and the data. This way you can lock down access, you can ensure there is no way SQL injection can ever take place no matter how stupid the application is, etc. So it turns out, you have to write exactly the same amount of T-SQL as you did before, regardless if you use LINQ to SQL or not.

Now, you have your data within your application. Do you use datasets to hold your data? You said yes didn’t you? I thought I told you to shoot yourself. Data is stored in business entities so that validation can take place and business rules can be applied. So both the business and the data access layer remain unchanged regardless whether LINQ to SQL was used or not.

So you have written all this code you would write if you did not use LINQ to SQL and now … well now you have to write more code so that you can use this cool thing called LINQ. Except there is no point using LINQ now as you have loaded your data from the database, you have already created business entities, applied the validation and business rules and the job is done. Where was LINQ useful in any part of this process? And please don’t say that you might want to do filtering and further data manipulation at the presentation layer because I will shoot you myself.

Why am I bitching so much?

LINQ has polluted .NET

The languages and .NET in general have been polluted by LINQ. Over 90% of the features added to .NET 3.5 are to accommodate LINQ and most of them are terrible or just not necessary. I’m not the only one who thinks so either.

So if I don’t like something I should just not use it and shut up right? I would if I could. Unfortunately I develop enterprise level applications for a living. This, more often than not nowadays, means large development teams and use of outsourcing. I have to make a system that is developed by 30 different developers of a different background and competence level, consistent and supportable with readable source-code. With extension methods that can take over class functionality from anywhere and type inference combined with lambda expressions that allow you to write hundreds of lines of completely unreadable and un-maintainable code, my professional life has just turned into hell.

Update: This is an exhaggeration of course to prove a point, but still ...

Currently rated 5.0 by 1 people

  • Currently 5/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


Subscribe

Syndicate
AddThis Feed Button
AddThis Social Bookmark Button

Search

Calendar

<<  August 2008  >>
SuMoTuWeThFrSa
272829303112
3456789
10111213141516
17181920212223
24252627282930
31123456

Archive

Tags

Categories


Blogroll

Disclaimer

The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

© Copyright 2008

Sign in