SSAS Best Practices Series – Part IV – Data Source Design Best Practices

Network Packet Size - Architecture When to Use

In this post, we are continuing the SSAS Multidimensional Best Practices Series. Part I introduced the different flavors of analysis available with the most recent releases of Microsoft Office and SQL Server.  Part II, discussed the various aspects of a typical business intelligence system architecture, performance, and the SSAS internal engine. Part III, discussed some of the best practices to consider when developing or working with the relational dimensional model database.

This post will cover some of the best practices to consider when developing or working with a SSAS multidimensional solution in Visual Studio related to data source design.  Establishing a data source is the first step performed when creating a SSAS solution in BIDS (SQL 2005, 2008, 2008R2), Visual Studio (SQL 2005, 2008, 2008R2), SQL Server Data Tools (SQL 2012, 2014).

Visual Studio - Solution Explorer - SSAS - Data Source
Visual Studio – Solution Explorer – SSAS – Data Source

In this scenario, there is already a data source (ds) available.

To begin working with the various settings and properties within the data source you simply right-click it and select open.

Visual Studio - SSAS Solution - Open Data Source
Visual Studio – SSAS Solution – Open Data Source

 

This will open the Data Source Designer. All of the settings from the Connection String to the Network Packet Size settings are done from within this designer.

 

 

 

OLAP databases in Analysis Services have several special requirements that are important to ensure trouble-free access to source data.

Connection Strings

Use only supported OLEDB Providers in a Data Source

Analysis Services was designed and tested to work with a specific set of OLE DB providers. Although other OLE DB providers are available, and the Data Source Wizard lets you choose any compatible provider, the capabilities and behavior of different providers can differ substantially.  This is true even for different providers that connect to the same database. Therefore, you should use only the supported providers.

Do not use the .Net SQLClient Data Provider to connect to a SQL Server data source

Because the Analysis Services server runs in native code, you can get better performance by using a native provider. Therefore, do not use the .Net SqlClient Data Provider; instead, use the Microsoft OLE DB Provider for SQL Server or the SQL Native Client provider.

Visual Studio - SSAS Solution - Data Source Designer - Connection String
Visual Studio – SSAS Solution – Data Source Designer – Connection String

 

To access the Connection String properties and settings, click the Edit button in the Data Source Designer.

 

 

 

 

 

 

 

 

Visual Studio - SSAS Solution - Connection Manager - Data Source Providers
Visual Studio – SSAS Solution – Connection Manager – Data Source Providers

This will open the Connection Manager.  Within the Connection Manager you will find the standard settings for server name, authentication, and database in which the relational dimensional database is located.  Additionally, the Data Source Providers are available from within the Connection Manager.  Typically, this option will be set by default when the initial setup of the data source is established.  However, it is always a good idea to check to make sure it is using the correct provider option.

 

Network Packet Size

Selecting the All tile in the Connection Manager will reveal all of the properties associated with the data source connection settings.

Visual Studio - SSAS Solution - Connection Manager - Network Packet Size
Visual Studio – SSAS Solution – Connection Manager – Network Packet Size

In addition to creating a high-speed network (yes you need to work with your network folks!), there are some additional configurations you can change to further speed up network traffic.

Under the properties of your data source, increasing the network packet size for SQL Server will minimize the protocol overhead required to build many, small packages. The default value is 4096. With a data warehouse load, a packet size of 32K (in SQL Server, this means assigning the value 32767) can benefit processing. Instead of changing the value of the SQL Server, override it in your data source.

 

There are architecture scenarios where the packet size setting can make a difference and scenarios when it will not.

Network Packet Size - Architecture When to Use
Network Packet Size – Architecture When to Use

This is an overly simplified architecture compared to most that people may deal with, but the objective is to demonstrate that if you have a single instance (SCENARIO A) with the various SQL Server components on it…adjusting the packet size will have no real or net affect on the packet size being passed over the network.

 Shared Memory

If your source system (SQL Server) and SSAS are both on the same machine (SCENARIO A), you should consider using the Shared Memory net library for better performance. The performance benefit comes from the fact that it bypasses the physical network stack.  It uses the Windows Shared Memory feature to communicate between SQL Server and the client/SSAS. This Net-Library is enabled by default and used when you specify either a period or (local) as your machine name or localhost or machine name or by prefixing machine\instance name with lpc: when connecting to a SQL Server instance.

Network/TDS Packet Size

Contrarily, in System Architecture – Scenario B where we have more of a distributed architecture and SSAS is installed on a separate server than say where the relational dimensional data is can improve your query and processing performance by changing the PACKET SIZE on the Data Source to allow for larger packet sizes to be transported over the network. This minimizes the protocol to build many small packages (packets).  This can benefit your processing.

During cube processing data moves from your relational data warehouse to SSAS in TDS (Tabular Data Stream) packets. As data movement between the relational data warehouse and SSAS is normally high, we should configure this to have a bigger packet size (therefore less packets) than using a smaller size (high number of packets) to minimize the overhead of breaking data down into multiple chunks/packets and reassembling it at other end. To change the packet size you can go to connection manager, click on the All page on the left side and specify 32KB for the packet size property instead of its default value of 4KB. Please note, changing the network packet size property might be good for data warehousing scenario but not for OLTP type applications and therefore it’s better to override the packet size property for your connection separately instead of  changing it on SQL Server for all connections.

Isolation, Query Timeout, and Maximum Number of Connections

There are a couple of things you need to be aware of when connecting to source systems using data sources in an SSAS project:

Visual Studio - SSAS Solution - Connection Manager - Isolation Query Timeout, & Max Connections
Visual Studio – SSAS Solution – Connection Manager – Isolation Query Timeout, & Max Connections

 

Isolation – Sometimes when your source system is continuously changing and you want consistent data, you might need to pull data in using Snapshot isolation mode which in turns uses a version store in SQL Server. In other scenarios, use the default ReadCommitted isolation mode in order to avoid extra overhead/copies at the source system.

Query timeout – With this property you can specify the query timeout for queries being run against the source system. If you have huge source tables with inadequate indexes or out of date statistics, your query is going to take longer, so make sure you specify the value which is appropriate or specify 0 for unlimited timeout. Actually, I would encourage you to go make sure you don’t have out-of-date statistics, fragmented indexes, etc. instead of spending time here.  If you are spending a lot of time here and the problem you are trying to solve never seems to work…I would take a look at what is happening on the relational dimensional database side of things.

Number of connections – With this you can specify the maximum number of connections that SSAS can create in parallel to pull data from source systems during cube processing. This really helps in cube processing to run in parallel by creating multiple connections to refresh several dimensions and facts in parallel. The default value for this is 10 and you should consider increasing this if you have a cube with lots of dimensions/facts and your source supports more parallel connections. This will greatly improve the cube processing times.

Impersonation

This allows you to specify the credentials which will be used for connecting and pulling data from source systems.  The best practice from a production scenario is to apply the Use the service account option.

Well that mostly covers the Data Source design best practices. Taking the time to go through and setup these settings and properties will definitely assist your Analysis Services solution to perform better. Hopefully this information is useful to you and will give you a jump start on your SSAS solutions.

2 Comments

  1. I have question regarding the data source used to populate a SSAS Multidimensional cube. Is the ideal source of data a well-designed dimensional model using the best practices of a star-join-schema (i.e., facts & conformed dimensions)? What are the advantages of using a dimensional model? What are the downsides of not using a dimensional model? Much appreciate your insights to this question.

    • Gary thanks for stopping by the blog.

      As you probably know, it is widely debated as to what is the most efficient modeling technique for reporting and analytics: star schema, snowflake schema, or even third to fifth normal form data vault models. All are considered by data warehouse designers as candidates. Note that the Analysis Services Unified Dimensional Model (UDM) is a dimensional model, with some additional features. No matter which model you choose as the end-user reporting model, performance of the relational model boils down to one simple fact: joins are expensive. This is also partially true for the Analysis Services engine itself. For example, if a snowflake is implemented as a non-materialized reference dimension, users will wait longer for queries, because the join is done at run time inside the Analysis Services engine. The largest impact of snowflake schemas occurs during processing of the partition data. For example. if you implement a fact table as a join of two big tables (separating order lines and order headers instead of pre-joined values), processing of facts will take longer because the relational engine has to compute the join.

      It is possible to build an Analysis Services cube on top of a highly normalized relational model, but prepared to pay the price of joins when accessing the relational model. I would suggest NEVER doing this.

      In most cases beyond a star schema relational design you will the price will be paid at processing time. In my experience as it relates to SSAS UDM design, I would recommend a star schema.

      I hope this information helps.

      Regards

Leave a Reply to Gary Hanson Cancel reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.