Tuesday, October 21, 2014

Microsoft 70-486: Design a caching strategy

Exam Objectives


Implement page output caching (performance oriented), implement data caching, implement HTTP caching, implement Azure caching

Quick Overview of Training Materials


Asp.net - Data caching in ASP.Net applications



Page output caching


Output Cache: 

The output of an action method on a controller can be cached using the [OutputCache] attribute on the method.  Actions methods that return views will have the rendered page cached, while methods returning JSON data will have that data saved.  A number of properties on the OutputCacheAttribute class control how data is cached (I figured out toward the end of this section that there is a critical difference between OutputCacheAttribute and OutputCacheParameter: the former is part of Web.MVC namespace, the latter part of Web.UI namespace):
  • CacheProfile - If a number of methods will have the same cache settings, it makes sense to use the web.config file to create a cache profile that can be used across all these methods.  The cache profile is created using the OutputCacheProfile class and looks something like this:

    <system.web>
        <caching>
          <outputCacheSettings>
                <outputCacheProfiles>
                    <add name="Long" duration="300" />
                    <add name="Short" duration="3" />
                </outputCacheProfiles>
            </outputCacheSettings>
        </caching>
     </system.web>

    Any of the attributes of an outputCache can be set and reused by using a cache profile.  Using the cache profile simple involves setting the CacheProfile attribute on the OutputCache annotation:

    [OutputCache(CacheProfile="Long")]
    public ActionResult Index(string searchTerm = nullint page = 1)
    { ... }

    One caveat to using Cache Profiles is that child actions will throw an exception if the duration attribute is not explicitly set.
      
  • ChildActionCache - This is not an attribute of the cache, per se, but instead is an ObjectClass, a collection of objects for the child actions of the method.  Using this property has it's difficulties, as it is not clear how the unique ids for the cached objects are generated.
      
  • Duration - How long, in seconds, the output should be cached.  To save an item for 5 minutes, duration would be set to 300:

    [OutputCache(Duration=300)]
    public ActionResult Index(string searchTerm = nullint page = 1)
    { ... }
       
  • Location - This value is of type OutputCacheLocation enum.  It configures where cached output is stored, and corresponds to the HttpCacheability enumeration used for the Cache-Control Http header. It can have the following values (HttpCacheability equivalent in parens):
    • Any (public) - cache can be located on client, server, or any intermediate device (proxy) participating in the request.
    • Client (private) - cache is located on the browser.
    • Downstream (--no equivalent--) - cache can be located on any device besides the server, including proxy devices or the browser
    • None (nocache) - cache is disabled for requested page.
    • Server (server) - cache is located on the web server
    • ServerAndClient (private and server) - cache can be located on browser OR on server, but not on intermediate proxies.
         
  • NoStore - Boolean value, when set to true it instructs the browser not to cache information from the page locally.  One use case may be when caching is enabled for all pages via configuration settings, but a method dealing with sensitive information needs to be kept out of the cache.
      
  • SqlDependency - This attribute creates a relationship with a table in a database.  When values in that table change, output pages are removed from the cache.  Several configuration steps must be taken to enable SqlDependency to work (these are covered in depth in the walkthrough from Microsoft):
    • cache notification must be enabled on the Sql Server database table. This is done with the aspnet_regsql command line utility.
    • data connection is added to the project
    • the database and table are supplied to SqlDependency as such:

      [OutputCache(SqlDependency = "Database.Table")]
      

      This creates a link to the "Table" table in the "Database" database, just as "Northwind.Employees" would create a link to the "Employees" table in the "Northwind" database and "OdeToFood.Restaurants" would create a dependency on the "Restaurants" table in the "OdeToFood" database... ok, you get it.
        
  • VaryByContentEncoding - This parameter accepts a list of strings seperated by semi colons representing encoding tokens used by the "accept encoding" header. If this was set, for example, to "gzip, deflate", then output for requests with the "accept-encoding" header set to "gzip" would be cached seperately from output for requests with the "accept-encoding" header set to "deflate".

    [OutputCache(VaryByContentEncoding = "gzip;deflate")]
    public ActionResult Index(string searchTerm = nullint page = 1)
    { ... }
      
  • VaryByCustom - This parameter allows you to implement customized cache variability. This envolves some additional steps: First, you must override the GetVaryByCustomString method in the global.asax file, which will look something like this:

    public override string GetVaryByCustomString(HttpContext context, string arg) { 
         if (arg == <your custom parameter>)      { 
               if (<predicate for condition you are varying by>)           { 
                   return ABC; 
               } 
               else 
               { 
                   return DEF; 
               } 
          } 
          else 
          { 
               return base.GetVaryByCustomString(context, arg); 
          }
    }

    Requests which meet the condition will be returned the cached output associated with the value of ABC, whereas everything else will get what is associated with DEF. One use case for this is returning different cached output to users based on whether they are logged in or no, as in this example and this example, the minor version of the browser, as in the MSDN example, or based on the roles the user is in, as in this example using Web.Security.Roles and this example using a cookie value.

    Once the method override exists in the global.asax file, the method can be decorated with the OutputCache attribute using the VaryByCustom parameter with the name of the custom parameter, matching <your custom parameter> in the example above:

    [OutputCache(VaryByCustom = <your custom parameter>)]
    public ActionResult Index(string searchTerm = nullint page = 1)
    { ... }

    The VaryByCustom value of "browser" is built into the caching module. In the above override, using "browser" as the string will bypass the custom functionality and call the base method, which will recognize "broswer" and return the broswer type (i.e. "IE9", "Chrome", etc.).
  • VaryByHeader - This parameter takes a semicolon seperated list of HTTP headers (MSDN list) by which to vary cached pages.  Examples of this may be the "Accept-Language" header, which would store a different version of the page for each different language requested (which may vary based on resource files).  The OdeToFood MVC tutorial from pluralsight uses this as an example.  The OdeToFood example also uses "X-Requested-With", which indicates that the request was an AJAX call, and is thus cached seperately (in this case, AJAX calls got a different output, so different caches was essential to make it work right).  Using "User-Agent" will produce more granular cached values for different browsers compared to VaryByCustom = "browser", since the same browser family may produce different user agent header values.

    [OutputCache(VaryByHeader="X-Requested-With;Accept-Language")]
    public ActionResult Index(string searchTerm = nullint page = 1)
    { ... }
     
  • VaryByParam - Using VaryByParam will cause different versions of the page to be cached based on the values in the query string or POST parameters.  At either end of the spectrum are "None" which caches one version regardless of any parameters, and "*", which caches a different version for every parameter.  One use case for this is for pagination purposes; a query variable "page" tells the application which page of data is being requested, and the applications caches each page seperately.

    [OutputCache(VaryByParam="page")]
    public ActionResult Index(string searchTerm = nullint page = 1)
    { ... }
     
  • VaryByControl - ** Part of the Web.UI namespace.... doesn't seem that relevant to MVC **

Donut Caching: 

There are certain use cases where it makes sense to cache a page, but for one piece of dynamic code that needs to be updated.  This is the idea behind "donut cacheing", with the dynamic content being the "hole" in the otherwise cached "donut".  In previous iterations of ASP.NET, it was possible to implement donut caching by using the substitution api, however this is not supported in MVC 2+.  What I find interesting about this fact is that in the Exam Ref, they talk about donut caching using HttpResponse.WriteSubstitution, though everything I've found has indicated that using Response.WriteSubstitution no longer works. Oops.

The solution (in the real world) is to use the MVCDonutCaching Nuget package.  Since this is third party software it won't be on the test, but knowing how it works and how it's used can't hurt.  First, the package overloads the Html.Action() helper method to add a flag that indicates whether the content of the action should be cached.  Actions that should not be cached are flagged, and the helper method inserts an html comment in the cached page output indicating what content should be generated when the cached page is returned.  Next, a new attribute based on ActionFilterAttribute is created (due to the way OutputCache intercepts Http requests, it isn't feasible to base it off the existing attribute class). This new attribute, called DonutOutputCache, looks in the cached output for these html comments (insertered by the helper method) and replaces these with the newly generated output from that action.  Cached items can be managed with the OutputCacheManager class.

Donut Hole Caching: 

The idea behind donut hole caching is the inverse of donut caching: just one small peice of the page is cached while everything else is rendered new.  This is also refered to as "partial page caching" and it is mostly supported in MVC natively.  Doing partial page caching is simply a matter of using the OutputCache attribute on a child action (i.e. Html.Action or Html.RenderAction).

[ChildActionOnly]
[OutputCache(Duration = 60)]
public ActionResult GetFoo(string bar) 
{ ... } 

There are limitation, however, due to the way child actions are treated.  Only the "Duration" and "VaryByParam" properties are supported on child actions, and Duration is required.  Not setting a duration will result in an exception.  CacheProfiles are not supported at all on child actions, and disabling cache in web.config is ignored in these cases.  Finally, the cache output for child actions is difficult to manage because of the way it is stored: as a collection (ChildActionCache) on the parent.  I found a workaround for these limitations here, though since it's from the community I'm sure it won't be tested, probably enough to be aware of the built-in limitations...

Data caching


Data that is accessed frequently but does not change often can be cached to improve performance, especially if the data is expensive to retrieve or calculate (such as a call to a database or, even more so, an external web service).  One method of caching data involves the use of the WebCache helper class.  Another implementation uses the HttpContext.Cache property, which is just the Cache object for the current application domain.

The basic pattern for using data caching in this way is something along these lines:
  • Application makes a call to a method used to retreive a piece of data
  • That method check to see if the requested data has already been cached
  • If it is in the cache, the method returns the cached data, otherwise it recreates the data by calling the datasource and adding the data to the cache
My previous post on state management discussed the cache object, though I didn't go into using it.  The following code is essentially the example code supplied by every class in MSDN in any way connected to runtime caching:

  //here we are declaring a reference to the default instance of MemoryCache
  ObjectCache cache = MemoryCache.Default;
 
  //attempts to load the contents of the "filecontents" cache item
  string fileContents = cache["filecontents"as string;
 
  //if this value is null, then "filecontents" was not found in cache
  if (fileContents == null) 
  {
      //create a caching policy object setting duration to 10 seconds
      CacheItemPolicy policy = new CacheItemPolicy();
      policy.AbsoluteExpiration =
          DateTimeOffset.Now.AddSeconds(10.0);
 
      //code to fetch data we want
      List<string> filePaths = new List<string>();
      string cachedFilePath = Server.MapPath("~"+
          "\\cacheText.txt";
 
      filePaths.Add(cachedFilePath);
 
      //this essentially adds a dependency to the file
      //if the file contents change, the cached content will
      //be removed
      policy.ChangeMonitors.Add(new
          HostFileChangeMonitor(filePaths));
 
      // Fetch the file contents.
      fileContents = System.IO.File.ReadAllText(cachedFilePath) + "\n"
          + DateTime.Now.ToString();
 
      //add filecontents to the cache, which will persist for ten seconds
      cache.Set("filecontents", fileContents, policy);
 
  }
 
  return fileContents;


One important fact I think is worth emphasizing here is the difference between System.Web.Caching and System.Runtime.Caching.  The usage is very similar (though the exact methods are different), I did find it a bit confusing that Web cache is accessed through the HttpContext object.  While I initially thought this would mean that it was only applicable to the current request, this property does in fact return the Cache object for the current application domain, as described by the MSDN Library page.  Thus it still is accessible to every user. StackOverflow points out a few of the other, more subtle differences between the two caches.  The WebCache helper also seems to be application scoped, though it appears a less robust option since it does not allow for setting dependencies and no easy way of enumerating keys or clearing the whole cache.  According to this MSDN article, Microsoft recommends System.Runtime.Caching for all .NET 4+ applications, whereas applications using older (.NET 3.5 and prior) only have Web.Caching available.

The following code (using System.Web.Caching instead of System.Runtime.Caching) would do essentially the same thing as the above code:

  //here we are declaring a reference to the Cache object
  System.Web.Caching.Cache cache = this.HttpContext.Cache;
 
  //attempts to load the contents of the "filecontents" cache item
  string fileContents = cache["filecontents"as string;
 
  //if this value is null, then "filecontents" was not found in cache
  if (fileContents == null) 
  {
    //set expiration time for cached item to 10 seconds
    DateTime AbsoluteExpiration =
        DateTime.UtcNow.AddSeconds(10.0);
 
    //code to fetch data we want
    List<string> filePaths = new List<string>();
    string cachedFilePath = Server.MapPath("~"+
        "\\cacheText.txt";
 
    filePaths.Add(cachedFilePath);
 
    //this adds a dependency to the file
    //if the file contents change, the cached content will be removed
    CacheDependency dep = new CacheDependency(filePaths.ToArray());
 
    // Fetch the file contents.
    fileContents = System.IO.File.ReadAllText(cachedFilePath) + "\n"
        + DateTime.Now.ToString();
 
    //add filecontents to the cache, which will persist for ten seconds
    //cache.Set("filecontents", fileContents, policy);
    cache.Insert("filecontents", 
                 fileContents, 
                 dep, 
                 AbsoluteExpiration, 
                 Cache.NoSlidingExpiration);
  }
 
  return fileContents; 

HTTP caching


While Data Caching, as described above, involves storing data in memory on the server to make serving it up more responsive, it is also possible to cache information in the browser.  By and large, much of these capabilities are built into modern browsers; manipulating caching on the client is mostly a matter of setting the appropriate values in the response headers.  The headers relevant to client caching are the following:
  • cache-control (spec) - This header is the most influencial regarding client caching. This header may contain a number of directives:
    • public/private/no-cache - Corresponds to HttpCacheability, is set on Response.Cache using the SetCacheability() method.  
      • public - response may be cached by client and proxies
      • private - response is only cachable by the client (not proxies)
      • no-cache - response must be revalidated with every request.  This can be done using the "etag" header described below.
    • no-store - set this directive to prevent any part of the response or request from being stored.  A new response is sent for every single request for the resource.  Contrast this with no-cache, which allows the user agent to store the value, it just needs to check that the resource hasn't changed (using validation).
    • s-maxage/max-age - The max-age directive specifies how long an item can be held in the cache before it's considered stale. If set, it overrides the "expires" header. The "s-maxage" header is intended for shared, or intermediary caches (such as CDNs), and overrides both max-age and expires in these cases.
      **("min-fresh" and "max-stale" are similar directives defined in the spec but not supported on the HttpCachePolicy object.)
    • no-transform - Certain proxies convert media file formats to gain performance (think of reformating a jpeg to a smaller size gif).  This directive prevents that.
    • must-revalidate/proxy-revalidate - These directives require the content to be validated on every subsequent request. Both work essentially the same way, the difference being that must-revalidate applies to everyone, and proxy-revalidate only applies to shared (proxy) user agents.  For proxy-revalidate to do anything, cacheability must be set to "public", otherwise proxies won't cache the responses anyway.
  • expires (spec) - serves the same function as the "max-age" directive.  While not deprecated, max-age will take precedence, though it may be appropriate to set a matching value in "expires" for older user agents that do not recognize "cache-control"
  • etag (spec) - the "fingerprint" of the resource, it is usually a hash of the resource contents.  Whenever the contents of the resource change, the etag should also change to ensure that user agents trying to validate old resources will fetch the latest version.  User-agents use the etag in combination with the If-Match and If-None-Match request headers to validate a resource with the server.
  • vary (spec) - This works similar to the "VaryByHeader" output cache attribute described above, basically creating different cached versions of a resource depending on the headers specified by vary.  One common use case is vary = "Accept-Encoding" to ensure that gzip'ed and deflated content is cached seperately.
The screenshot below shows a few of these headers:


These headers can be controlled by using the methods on the Response.Cache object. Here are some examples:

 
Response.Cache.SetETag("7b1dc8120a5eb9351097682f8ebd4f1f");
Response.Cache.SetETagFromFileDependencies();
Response.Cache.SetExpires(DateTime.Today.AddDays(1));
Response.Cache.SetCacheability(HttpCacheability.NoCache);
Response.Cache.SetNoStore();
Response.Cache.SetNoTransforms();
Response.Cache.SetMaxAge(TimeSpan.FromDays(1));
Response.Cache.SetVaryByCustom("Accept-Encoding");
Response.Cache.SetRevalidation(HttpCacheRevalidation.AllCaches);
 

One strategy that is useful for resources that are fairly static (such as CSS and JavaScript files) is to set the max-age for these resources to the maximum value (one year, or 31536000 seconds), and change the file name whenever they change, such as by adding a hash or version number to the file name (this really only makes sense, though, if resource names are being set programmatically... manually changing resource names with every new version invites mistakes).  When these resources are called again, the changed filename will ensure they are downloaded again by the user agent.  The simplest way to set resource specific cache behavior is to simply change the file properties in IIS, however if you don't have access to IIS configuration, it might be necessary to implement a custom HTTP Handler to get such granular control of cacheability.  If your site serves different content based on User-Agent, then vary = "user-agent" would be appropriate.

Azure caching


As of now (October 2014), it seems that Azure based caching is being pushed toward Redis Caching. The Managed Cache Service is also available at this time, though creating a new instance requires the use of powershell (it isn't supported in the management portal yet). The current recommendation from Microsoft is to use the Redis cache, but for the sake of completeness I'll cover the Managed Cache Service as well (this is all that is covered, in brief, in the exam ref).  Knowing my luck, none of it will be on the test and it will all be obsolete in a year... ce la vie, ey?

Redis Cache

Creating a new Redis Cache is pretty simple in the new Azure Portal.  Just select "Add", choose Redis Cache, pick a pricing tier and a name, and badda boom, you're done:




Interacting with the cache can be done via the command line (redis-cli.exe), or by using a client library from within Visual studio.  The StackExchange.Redis NuGet package is one such client and is recommended by redis.io for use with C#.  The MSDN article on Azure Redis data caching provides examples using this library, and the logic flow is very similar to the data caching described above:  First, a connection is establised with the cache using the host name (under "PROPERTIES" in the portal blade pictured above) and the key (under "KEYS" above).  Then, when requests are made for data that is being cached, the cache is checked first.  If it's there, the cached value is returned, otherwise the data is retrieved from the source and a new cached key/value is added to the cache object.

Cache performance can be easily monitored from the Azure Portal.  On the standard pricing teir, Microsoft offers 99.9% SLA, including replication, which in redis is accomplished with master-slave duplication.  The preview video from Channel 9 demonstrates this failover capability.

Managed Cache Service (AppFabric)

Currently, the only way to create a Managed Cache Service is through the Azure Powershell.  I followed the instruction to the letter on Microsoft's example, only to find out that the name must be globally unique (oops).  Not a huge deal:


Once created, the cache can be managed through the old version of Azure management.  I couldn't find the cache through the new portal preview (might not have been looking in the right place, but it seemed like the only option there was the Redis cache...)


Rather than recreate Microsofts step by step guide on how to use Managed Cache, I'll stick to the highlights:

  • Install the NuGet package through the package manager.  I found it by searching the online section for "WindowsAzure.Cache" (as per Microsoft's recommendation)
  • Configuration elements (dataCacheClients and cacheDiagnostics) are added to the app/web.config by the NuGet package. 
    • dataCacheClients is where you add the endpoint information (e.g. troydemomanagedcache.cache.windows.net) and key (found under "Manage Keys" from the cache dashboard
  • The Azure cache requires the Microsoft.ApplicationServer.Caching namespace.  A reference to the cache instance is created with the DataCache class.  This can be done by newing a DataCache directly (passing the desired cache name to the constructor), or by using the DataCacheFactory class.
  • Data can be divided into regions (you have to create regions first though), Many operations are available, including the expected Add (like an "insert"), Put (like an "Add_or_Update"), Remove, Get... as well as operations on whole regions (ClearRegion, GetObjectsInRegion). Values can be appended and prepended. Locking and unlocking are supported, as are notification callbacks.  One last note: many of the methods listed on MSDN warn that they are not supported by Azure Shared Cache... that is NOT the same thing as this cache.  The Managed Cache Service (appears) to fully support the DataCache class.  Regions worked for me at least:




HTML5 Application Cache API


I covered the basics of the AppCache API in a previous post (manifest sections, events), so I'll concentrate here on how it fits into ASP.NET and MVC, and go into more depth on implementation strategies.

The following is Jake Archibald's list of "Gotchas" from his article "Application Cache is a Douchebag":
  1. Files always come from the application cache, even when you are online
  2. The Application Cache only updates when the manifest changes
  3. The Application Cache is an additional, not an alterative, cache
  4. Setting the manifest with a long expiration is a very bad idea
  5. Non-Cached resources will not load on a cached page
  6. No conditional downloads (think images and fonts)
  7. We don't know for sure why the fallback page is served
  8. Hard redirects are treated as failures
  9. AJAX requests have to be double checked (sometimes fail when they shouldn't)
While not an explicit "gotcha" he does note that implicit caching, as outlined in the Dive into HTML5 piece on AppCache, is a bad idea, as it doesn't have any control over how long implicitly cached pages are held in the cache, or when they are updated.  While we can force updates by changing the manifest file in some trivial way, it then will proceed to recheck EVERY page we have cached and redownload any with changes, which for a site that changes frequently (like Wikipedia in their example) is all the time.

His recommended approach uses the AppCache for static content like JavaScript and CSS files, and uses LocalStorage to store the data needed to render cached pages.  Using LocalStorage allows more control since pages can be easily added and removed.  In his case, there is a button available that toggles whether the page is available offline.  Also, to make pages behave themselves while online, the manifest was served by the fallback page in a hidden iframe in the body of the content pages (since the page containing the manifest is ALWAYS cached).

One method bloggers Dean Hume and Craig Shoemaker, among others, use is to serve the manifest file from a controller action.  This provides a great deal of programmatic control over what goes into the manifest, as well as making version control of the manifest quite simple (whether by simple version number increments or hashed fingerprints).  The process is pretty straight forward, since the manifest is really just a text file.  It's just important to set the MIME type to "text/cache-manifest" and set HTTP caching to "nocache" or something equivalent, to make sure a page in the application cache isn't served a stale manifest from the browser cache (and thus isn't updated when the manifest changes).  This controller action can be quite elaborate: Kazi Manzur Rashid demonstrates on his blog how bundling can be made to work with the manifest.  One commenter on Dean Hume's article tried this in an Azure hosted app and ran into problems with naming the manifest action "manifest" (changed to "appmanifest" and it worked), but I haven't tested that myself.

General strategies for using application cache to offline a web app include limiting the resources in the application cache to static pages, images, css, and script files.  AppCache is easiest to use fully when the page UI is seperated from the data, with static html pages (along with other assets) being stored in the AppCache, and data being stored in another storage mechanism and updated asyncronously.



No comments:

Post a Comment