In this post, we are continuing the SSAS Multidimensional Best Practices Series. Part I introduced the different flavors of analysis available with the most recent releases of Microsoft Office and SQL Server. Part II, discussed the various aspects of a typical business intelligence system architecture, performance, and the SSAS internal engine. Part III, discussed some of the best practices to consider when developing or working with the relational dimensional model database.
This post will cover some of the best practices to consider when developing or working with a SSAS multidimensional solution in Visual Studio related to data source design. Establishing a data source is the first step performed when creating a SSAS solution in BIDS (SQL 2005, 2008, 2008R2), Visual Studio (SQL 2005, 2008, 2008R2), SQL Server Data Tools (SQL 2012, 2014).
In this scenario, there is already a data source (ds) available.
To begin working with the various settings and properties within the data source you simply right-click it and select open.
This will open the Data Source Designer. All of the settings from the Connection String to the Network Packet Size settings are done from within this designer.
OLAP databases in Analysis Services have several special requirements that are important to ensure trouble-free access to source data.
Use only supported OLEDB Providers in a Data Source
Analysis Services was designed and tested to work with a specific set of OLE DB providers. Although other OLE DB providers are available, and the Data Source Wizard lets you choose any compatible provider, the capabilities and behavior of different providers can differ substantially. This is true even for different providers that connect to the same database. Therefore, you should use only the supported providers.
Do not use the .Net SQLClient Data Provider to connect to a SQL Server data source
Because the Analysis Services server runs in native code, you can get better performance by using a native provider. Therefore, do not use the .Net SqlClient Data Provider; instead, use the Microsoft OLE DB Provider for SQL Server or the SQL Native Client provider.
To access the Connection String properties and settings, click the Edit button in the Data Source Designer.
This will open the Connection Manager. Within the Connection Manager you will find the standard settings for server name, authentication, and database in which the relational dimensional database is located. Additionally, the Data Source Providers are available from within the Connection Manager. Typically, this option will be set by default when the initial setup of the data source is established. However, it is always a good idea to check to make sure it is using the correct provider option.
Network Packet Size
Selecting the All tile in the Connection Manager will reveal all of the properties associated with the data source connection settings.
In addition to creating a high-speed network (yes you need to work with your network folks!), there are some additional configurations you can change to further speed up network traffic.
Under the properties of your data source, increasing the network packet size for SQL Server will minimize the protocol overhead required to build many, small packages. The default value is 4096. With a data warehouse load, a packet size of 32K (in SQL Server, this means assigning the value 32767) can benefit processing. Instead of changing the value of the SQL Server, override it in your data source.
There are architecture scenarios where the packet size setting can make a difference and scenarios when it will not.
This is an overly simplified architecture compared to most that people may deal with, but the objective is to demonstrate that if you have a single instance (SCENARIO A) with the various SQL Server components on it…adjusting the packet size will have no real or net affect on the packet size being passed over the network.
If your source system (SQL Server) and SSAS are both on the same machine (SCENARIO A), you should consider using the Shared Memory net library for better performance. The performance benefit comes from the fact that it bypasses the physical network stack. It uses the Windows Shared Memory feature to communicate between SQL Server and the client/SSAS. This Net-Library is enabled by default and used when you specify either a period or (local) as your machine name or localhost or machine name or by prefixing machine\instance name with lpc: when connecting to a SQL Server instance.
Network/TDS Packet Size
Contrarily, in System Architecture – Scenario B where we have more of a distributed architecture and SSAS is installed on a separate server than say where the relational dimensional data is can improve your query and processing performance by changing the PACKET SIZE on the Data Source to allow for larger packet sizes to be transported over the network. This minimizes the protocol to build many small packages (packets). This can benefit your processing.
During cube processing data moves from your relational data warehouse to SSAS in TDS (Tabular Data Stream) packets. As data movement between the relational data warehouse and SSAS is normally high, we should configure this to have a bigger packet size (therefore less packets) than using a smaller size (high number of packets) to minimize the overhead of breaking data down into multiple chunks/packets and reassembling it at other end. To change the packet size you can go to connection manager, click on the All page on the left side and specify 32KB for the packet size property instead of its default value of 4KB. Please note, changing the network packet size property might be good for data warehousing scenario but not for OLTP type applications and therefore it’s better to override the packet size property for your connection separately instead of changing it on SQL Server for all connections.
Isolation, Query Timeout, and Maximum Number of Connections
There are a couple of things you need to be aware of when connecting to source systems using data sources in an SSAS project:
Isolation – Sometimes when your source system is continuously changing and you want consistent data, you might need to pull data in using Snapshot isolation mode which in turns uses a version store in SQL Server. In other scenarios, use the default ReadCommitted isolation mode in order to avoid extra overhead/copies at the source system.
Query timeout – With this property you can specify the query timeout for queries being run against the source system. If you have huge source tables with inadequate indexes or out of date statistics, your query is going to take longer, so make sure you specify the value which is appropriate or specify 0 for unlimited timeout. Actually, I would encourage you to go make sure you don’t have out-of-date statistics, fragmented indexes, etc. instead of spending time here. If you are spending a lot of time here and the problem you are trying to solve never seems to work…I would take a look at what is happening on the relational dimensional database side of things.
Number of connections – With this you can specify the maximum number of connections that SSAS can create in parallel to pull data from source systems during cube processing. This really helps in cube processing to run in parallel by creating multiple connections to refresh several dimensions and facts in parallel. The default value for this is 10 and you should consider increasing this if you have a cube with lots of dimensions/facts and your source supports more parallel connections. This will greatly improve the cube processing times.
This allows you to specify the credentials which will be used for connecting and pulling data from source systems. The best practice from a production scenario is to apply the Use the service account option.
Well that mostly covers the Data Source design best practices. Taking the time to go through and setup these settings and properties will definitely assist your Analysis Services solution to perform better. Hopefully this information is useful to you and will give you a jump start on your SSAS solutions.