Introduction

SQL Server provides an easy way to integrate custom user-defined aggregated function implemented in .NET project. While a simple function with internal flat type variables  (i.e. blittable: Int, Double) requires few easy steps, a function with  a more complex structures, such as internal classes or arrays, requires extra care. While in the first case, the framework handles all the necessary steps to serialize and deserialize the structure, when dealing with internal classes we are required to make sure that we are searlizing and desearializng out internal objects correctly. In the following example we will explore a simple way to create a user-defined aggregate function using Visual Studio 2013 and .NET 4.5

Step 1

Open the visual studio 2013 and create “SQL Database Project”, by selecting New from the File menu and then New Project. A new dialog window will be opened. Enter the name of the project (“AggregateFunctionProject”) and its location on the disk and press enter

Screen Shot 1
Step 1: Creating a new Visual Studio Database Project

Then select New Item by using the right click on the new project (“AggregateFunctionProject”)
that appears in the Solution Explorer window. A new dialog will appear and then you need to select SQL CLR C# from the left list and SQL CLR C# Aggregate from the right list.

A new file will appear containing a template function of an aggregated function

Select SQL CLR C# Aggregate to create a new template
Select SQL CLR C# Aggregate to create a new template

Step 2

An aggregated function is created by defining a struct with SqlUserDefinedAggregate attribute.  The attribute defines the following

  • Format: Serialization format for the struct. This is typically either Native or UserDefined. In case of Native format, the framework handles all the necessary steps to serialize and deserialize the structure.
  • IsInvariantToDuplicates (bool): Does receiving the same value twice or more affect the result
  • IsInvariantToNulls (bool): Does receiving a NULL value change the result
  • IsInvariantToOrder (bool): Does the order of values affect the result
  • IsNullIfEmpty (bool): Does an empty set result to a NULL value
  • Name (string): Name of the aggregate

The struct itself must contain at least the following methods:

  • Init: This is called when a new group of values is going to be handled using an instance of the structure
  • Accumulate: Each value is passed to the Accumulate method which is responsible for making the necessary calculations, etc.
  • Merge: This method is used when the original set of values is divided into several independent groups and after accumulating the group specific values, the group is merged to another group.
  • Terminate: And finally when all values have been handled, Terminate returns the result.

Each of these four functions should be implemented: Init will initialise the variables. Accumulate will be used to acquire the data and calculate the value iteratively. Merge will be used to merge several sub groups into one single group and Terminate will return the final value for each group. Not in all cases the calculation can be done iteratively during the process of the Accumulate function. Sometimes we can only acquire the data in the Accumulation and the actual calculation is done in the Merge or/and in the Terminate method. In these cases, the acquisition of the data requires an internal structures that are not blittable.

Step 3

For example suppose we would like fit a linear curve to some data points in few different experiments. The data would look like this:

DataPoints

Our aim is to calculate the best linear fit for each one of the experiments. To do that we would use the following script:

SELECT ExperimentId, dbo.curveFitting(datapoint1,datapoint2,’a’) ‘a’, dbo.curveFitting(datapoint1,datapoint2,’b’) ‘b’ FROM data

GROUP BY ExperimentId

 The script group each one of the experiment and calculate the a parameter and the b parameter separately. The curveFitting function gets 3 input parameters: datapoint1, datapoint2 and the type of output (‘a’ for the a parameter and ‘b’ for the b parameter in the Y=aX + b linear curve equation)

To implement the curve fitting function we will use the following script:

//------------------------------------------------------------------------------
// <copyright file="CSSqlAggregate.cs" company="Meaningo">
//  All rights reserved.
// </copyright>
//------------------------------------------------------------------------------
using System;
using System.Data;
using System.Collections.Generic;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Runtime.InteropServices;
using DatabaseProject2;


[SqlUserDefinedAggregate(Format.UserDefined,
        MaxByteSize = -1,
        Name = "InPlayIPI", IsInvariantToDuplicates = false,
        IsInvariantToNulls = true, IsInvariantToOrder = true,
        IsNullIfEmpty = true)]
[Serializable]
[StructLayout(LayoutKind.Sequential)]
public struct inPlayIPI : IBinarySerialize
{
    public void Init()
    {

        curve_fitting = new CurveFitting();
        num = 0;
    }

    public void Accumulate(SqlDouble dataPoint1, SqlInt32 dataPoint2, SqlString sqlOutputParameter)
    {
        string output = (string)sqlOutput;
        curve_fitting.Accumualte((double)dataPoint1, (double)dataPoint2);
        switch (sqlOutputParameter)
        {
            case "a": num = 1;
                break;
            case "b": num = 2;
                break;
            default:
                break;
        }
    }

    public void Merge(inPlayIPI Group)
    {
        this.curve_fitting.data_points.AddRange(Group.curve_fitting.data_points);
    }

    public SqlDouble Terminate()
    {
        try
        {
            
            Tuple<double, double, double> parameters = curve_fitting.FitCurve();

            switch (num)
            {
                case 1:
                    return new SqlDouble(parameters.Item1);
                case 2:
                    return new SqlDouble(parameters.Item2);
                default:
                    return new SqlDouble(0);
            }
        }
        catch (Exception e)
        {
            return new SqlDouble(0);
        }

    }

    // Our private variables  
    private CurveFitting curve_fitting;
    private int num;
}

Step 4

now the final step is to complete the two serialization functions write() and read(). In the write function we need to serialize the curveFitting class, variable after a variable in the same order that we will later read it using the read() function.

    public void Read(System.IO.BinaryReader r)
    {
        num = (Int32)r.ReadInt32();
        for (int i = 0; i <= itemCount2 - 1; i++)
        {
            this.curve_fitting.data_points.Add(new Tuple<double,double>((double)r.ReadDouble(), (double)r.ReadDouble()));
        }
    }

    public void Write(System.IO.BinaryWriter w)
    {
        w.Write(num);
        foreach(var item in curve_fitting.data_points)
        {
            w.Write(item.Item1);
            w.Write(item.Item2);
        }
    }

Leave a Reply

Your email address will not be published. Required fields are marked *