File Access In MHP – TV Without Borders

Applications generally can’t do very much without access to a filesystem of some kind. Usually, filesystems are used for loading content such as graphics, audio clips or other data such as text.

In MHP, the only filesystem type that an application can usually access is a broadcast filesystem such as the one that the application was transmitted on. MHP 1.1 receivers may support loading applications over the return channel using HTTP, but no such boxes are actually deployed yet.

There may be some persistent storage in an MHP receiver that an application can access, but this is often limited in size and can not be used for storing applications. While some receivers may use a hard disk for persistent storage the vast majority of receivers on the market will use a small amount of FLASH memory that may also be used for storing settings and user preferences. Access to this local persistent storage is handled by the persistent storage API as discussed below.

When accessing both broadcast and persistent filesystems, it’s important to remember that relative filenames will not work interoperably in many cases. Relative filenames will always be considered relative to the application base directory indicated in the Application Information Table.

Accessing broadcast filesystems

Since most MHP applications don’t have access to local storage on the receiver, this makes filesystem access more complex. While an application can use the standard classes from java.io to access broadcast filesystems, the nature of a broadcast filesystem has some impact on the way that those classes work.

The most obvious difference is latency – in a broadcast filesystem such as a DSM-CC object carousel, the receiver has to wait for the module containing the file to be broadcast. Since this could take several seconds, there may be an extremely high latency on filesystem requests. Application developers can cope with this in two ways:

arrange the object carousel so that all commonly-used files are broadcast often enough to reduce the latency and so that commonly-used files are in the same module (making caching easier).
make sure that any content loaded from a broadcast filesystem is requested in plenty of time for it to be used (where possible) so that the user doesn’t notice the latency in loading content.

While these points can’t always be followed, it’s a good idea to follow them where possible. Not only does it make your application run a little smoother, but it forces you to think a little more about the structure of your application and how the user interacts with it and maybe give you a clearer idea about the quality of your design.

The second difference between DSM-CC and conventional filesystem access is the importance of caching. Since latency is so high, many receivers cache some DSM-CC modules to improve performance. However, it is possible for broadcasters to update the contents of a file while the carousel that contains it is being broadcast. Thus, the contents of the cache may be out-of-date. The DSM-CC API lets the receiver explicitly choose how the cache should be used – an application can choose to either check the cache first, then load the file from the carousel if it’s not present (the default), to only load a file from the cache (ensuring low latency but risking not being able to find the file) or to only load a file from the stream (ensuring the most up-to-date copy of the file at the expense of high latency).

Like any other type of filesystem, there are some basic things that we will want to do with a DSM-CC filesystem. The most obvious of these it open files and access the data they contain. Since DSM-CC is a broadcast filesystem, we can’t write to it, so we’ll just have to settle for read-only access. As we’ve already seen, an MHP receiver will basically treat the current DSM-CC object carousel as the standard filesystem, and an application doesn’t really need to use a different API to access files. So why does MHP provide an API for developers to access the object carousel?

First, the DSM-CC API offers new functionality to support the features of an object carousel that are not supported in the standard file API, such as asynchronous loading. Second, provides access those parts of DSM-CC that do not relate to files. It’s important to realize that a DSM-CC object carousel is used to broadcast objects, not just files. This may seem obvious, but what do we actually mean by objects? In this case, we mean files (which may be real files or objects representing directories), streams of data or stream events. DSM-CC is also used to carry DSM-CC normal play time information and some other non-filesystem information. In this section, we’ll only consider the aspects of the DSM-CC API that are related to filesystem access.

The DSMCCObject class

The DSM-CC API is contained in the org.dvb.dsmcc package, and the most important class in this package is the DSMCCObject class. This class (a subclass of java.io.File is used to represent objects in a DSM-CC carousel. As with a standard File object, it is created with either an absolute or relative path name, and can then be used in much the same way as a standard file. The only restrictions are that if the an instance of DSMCCObject represents a directory, the methods provided by java.io.File should be used to manipulate it. For instances representing files, then the java.io.FileInputStream or java.io.RandomAccessFile classes must be used to access the contents of the file.

As we’ve already seen, the standard Java file operations will allow you to access the data in a particular object in the DSM-CC carousel. However, this doesn’t give you much flexibility in how you access the data. In particular, access to a DSM-CC carousel using the standard java.io file operations is synchronous, so the thread making the request will block until the operation has finished.

Given the latency issues involved in working with DSM-CC this can cause real problems for a developer, especially if you want to preload content so that you can display it quickly when the user wants it. In a synchronous API, you would either have to load each file on a different thread or wait for one file to finish loading before loading another. Neither of these options are really acceptable when you have a large number of files to load.

To help solve this problem, the DSMCCObject class supports asynchronous loading of files. The asynchronousLoad() method allows an application to load a DSM-CC object without delaying the current thread. This method takes an AsynchronousLoadingEventListener object as a parameter, which will be notified when loading is complete. Thus, an application can get on with other tasks while waiting for content to load.

Similarly, the synchronousLoad() method loads an object synchronously. The advantage of this over other file loading methods is that it allows the application to abort the loading process from another thread using the abort() method (which can also be used to abort an asynchronous load). By explicitly loading the file first (either synchronously or asynchronously), an application can reduce the latency on other file operations to a known level.

It is possible to load directory information separately from the files in that directory. Hence, we can know the names and sized of files within a directory, but not have any of those files loaded. This directory information is also contained in the object carousel, and will thus have high latency for access. The loadDirectoryEntry() method allows an application to load directory information asynchronously so that the information is available to the application when it is needed.

The DSMCCObject class provides two other methods to control the loading of objects. The prefetch() method allows the application to hint to the DSM-CC subsystem and cache that a file will be used in the near future, allowing the receiver some extra time to load the object before it is needed. On the other hand, the unload() method tells the receiver that an object is no longer needed, and can be flushed from the cache.

As we have already mentioned, DSM-CC filesystems allow the broadcaster to update the contents of a file on the fly. The addObjectChangeEventListener() and removeObjectChangeEventListener() methods allow an application to register and unregister respectively for notification when a particular object gets updated. For instance, a news ticker application could use this to receive notification of when the text of a story is updated.

Since an object carousel can contain objects other than files, the DSMCCObject class allows an application to determine what type of object is being referred to using the isObjectKindKnown(), isStream() and isStreamEvent() messages. We can also get a URL that refers to a particular file using the getURL() method. This can be useful when dealing with APIs such as the Java Media Framework that use a URL to refer to pieces of content.

The full interface for the DSMCCobject class is given below. There are a few methods that we have not discussed here, but these are much less commonly used:

public class DSMCCObject extends java.io.File {

  public DSMCCObject (String path);
  public DSMCCObject (String path, String name);
  public DSMCCObject (DSMCCObject dir, String name);

  public boolean isLoaded();

  public boolean isObjectKindKnown();
  public boolean isStream();
  public boolean isStreamEvent();

  public void synchronousLoad()
  throws InvalidFormatException,
         InterruptedIOException,
         MPEGDeliveryException,
         ServerDeliveryException,
         InvalidPathNameException,
         NotEntitledException,
         ServiceXFRException;

  public void asynchronousLoad (
    AsynchronousLoadingEventListener l)
    throws InvalidPathNameException;

  public void abort() throws NothingToAbortException;

  public static boolean prefetch(String path,
                                 byte priority);

  public static boolean prefetch(DSMCCObject dir,
                                 String path,
                                 byte priority);

  public void unload() throws NotLoadedException;

  public java.net.URL getURL();

  public void addObjectChangeEventListener(
    ObjectChangeEventListener listener);
  public void removeObjectChangeEventListener(
    ObjectChangeEventListener listener);

  public void loadDirectoryEntry (
    AsynchronousLoadingEventListener l);

  public void setRetrievalMode(int retrieval_mode);
  public static final int FROM_CACHE =1;
  public static final int FROM_CACHE_OR_STREAM=2;
  public static final int FROM_STREAM_ONLY=3;

  public X509Certificate[][] getSigners();
}

Service domains and object carousels

There is no requirement for an DVB service to contain just one object carousel. There isn’t even a requirement for an application to only access object carousels that are part of the same service. So, if we can access other object carousels, how do we do it?

Just like filesystems in Unix, object carousels can effectively be mounted into a position in the current directory hierarchy. Before we see how we do this, let’s discuss terminology a little. An object carousel is sometimes referred to as a service domain. To be precise, a service domain is actually a group of related DSM-CC objects. In a broadcast network, these are contained in an object carousel and transmitted to the client. In an interactive network, a client can manipulate them using the DSM-CC User-to-User protocol. We will not discuss this protocol, since we are concerned mainly with broadcast networks. However, the basic operation is the same in both cases.

The MHP DSM-CC API represents a service domain using the ServiceDomain class. Before a service domain can be used, it must be attached. This mounts the service domain in the filesystem hierarchy, in the same way that the Unix mount command does. A service domain is attached using the attach() method. There are three different versions of this method, each taking a different set of parameters:

  public void attach(Locator l)
  public void attach(Locator service, int carouselId)
  public void attach(byte[] NSAPAddress)

The first version takes a locator referring to a component of a DVB service which contains an object carousel. The second version takes a locator for a DVB service and a carousel ID, so that the receiver can determine which carousel on that service to use. The final version takes a 20-byte NSAP address to identify the carousel. Most applications are more likely to use one of the first two versions.

One thing that you will notice is that none of these methods allow you to specify where in the directory hierarchy the service domain is attached. This makes sure that the receiver can avoid conflicts and mount the service domain in the location that suits it best. An application can use the getMountPoint() to get a DSM-CC object representing the mount point for the service domain.

After a service domain has been attached, files can be accessed as we have seen above. An application can unmount a service domain using the detach() method. This will give the receiver a hint that any files cached from that service domain can be discarded.

The fact that an application has attached to a service domain does not mean that service domain will always be accessible, however. If the receiver tunes away from the transport stream containing the carousel, for instance, then files in it may not be accessible even though the service domain is attached. If the receiver decides that it will never be able to connect to the carousel again, it may choose to automatically detach the service domain.

When the service domain isn’t accessible, then any attempts to access that file will fail as if the file never existed. At the level of the DSM-CC API, inability to access an object carousel will cause an MPEGDeliveryException to get thrown. This may not be a permanent failure, however, so future attempts to access objects in the carousel may succeed.

An example

This example shows how an application can use the DSM-CC API to access a file in a broadcast filesystem:

// create a new ServiceDomain object to represent
// the carousel we will use
ServiceDomain carousel = new ServiceDomain();

// now create a Locator that refers to the service
// that contains our carousel
org.davic.net.Locator locator;
locator = new org.davic.net.Locator
  ("dvb://123.456.789");

// finally, attach the carousel to the ServiceDomain
// object (i.e. mount it) so that we can actually
// access the carousel.  In this case, we don't specify
// the stream containing the carousel in the locator, so
// we pass in the carousel ID as part of the attach
// request
carousel.attach(locator, 1);

// we have to create our DSMCCObject with an absolute
// path name, which means we need to get the mount point
// for the service domain
DSMCCObject dsmccObj;
dsmccObj = new DSMCCObject(carousel.getMountPoint(),
  "graphics/image1.jpg");

// now we create a FileInputStream instance to access
// our DSM-CC object.  Alternatively, we could create
// a RandomAccessFile instance if we wanted random
// instead of sequential access to the file.
FileInputStream inputStream;
inputStream = new FileInputStream(dsmccObj);

// we can now use the FileInputStream just like any
// other FileInputStream

Of course, for this example there’s not much benefit in using the DSM-CC API. We could access the file using a java.io.File object for much of the actual file access (though not for mounting the filesystem to begin with). However, if instead of creating a FileInputStream we were to use the following approach, we would see the benefit of being able to load objects asynchronously:

// we can create a DSMCCObject instance without
// attaching to a carousel.
DSMCCObject dsmccObj;
dsmccObj = new DSMCCObject
  ("graphics/image1.jpg");

// create a new ServiceDomain object to represent
// the carousel we will use
ServiceDomain carousel = new ServiceDomain();

// now create a Locator that refers to the service
// that contains our carousel
org.davic.net.Locator locator;
locator = new org.davic.net.Locator
  ("dvb://123.456.789");

// finally, attach the carousel to the ServiceDomain
// object (i.e. mount it) so that we can actually
// access the carousel.  In this case, we don't specify
// the stream containing the carousel in the locator, so
// we pass in the carousel ID as part of the attach
// request
carousel.attach(locator, 1);

// we have to create our DSMCCObject with an absolute
// path name, which means we need to get the mount point
// for the service domain
DSMCCObject dsmccObj;
dsmccObj = new DSMCCObject(carousel.getMountPoint(),
  "graphics/image1.jpg");

// this time, we load the file asynchronously.
// the myListener variable (which we haven't
// defined here) is the event listener that will
// receive notification when the object is fully
// loaded.
dsmccObj.asynchronousLoad(myListener);

// now we can start loading another file in the
// same thread
DSMCCObject dsmccObj2;
dsmccObj2 = new DSMCCObject(carousel.getMountPoint(),
  "graphics/image2.jpg");

// this one will also use the same event listener
// for notification
// that the object is fully loaded.
dsmccObj2.asynchronousLoad(myListener);

In this case, we load two objects from the same thread without having to wait for the first to finish loading. The myListener object will receive notification that each object has loaded, allowing us to use the data, but in the meantime we can continue doing other useful work in the current thread. In the previous example, we would have had to explicitly spawn new threads to do something like this. Otherwise, we would have to wait for one file to load before we could start loading another and this could take an unacceptable amount of time.

Updating objects

One of the more useful features of DSM-CC is that each object in the carousel has a version number, and that objects can be updated as the carousel is being broadcast. This can be useful for updating data files for home shopping applications, or quiz applications or many different purposes. Applications can detect these updates and then choose to reload the changed file, or keep using the old data.

In order to detect these updates, an application must implement the org.dvb.dsmcc.ObjectChangeEventListener interface. This interface provides the receiveObjectChangeEvent() method which is called to notify the application of any updates to the object. When the application wishes to monitor an object for changes, it simply calls the addObjectChangeEventListener() method on a DSMCCObject instance representing the object in question. To stop monitoring for updates, the application can simply remove the listener from the object.

There is a need to be careful when you’re doing this, however. Old copies of the object may still be present in the receiver’s cache even after the object has been updated. In order to be completely sure about getting the newest version of the object, the application should set the retrieval mode for the object (using the DSMCCObject.setRetrievalMode() method) to FROM_STREAM_ONLY. This forces the receiver to ignore any cached copies of the object and retrieve a fresh copy of the object from the carousel.

myFile = new DSMCCObject(
  "application/some/file/path/file.txt");
myFile.addObjectChangeEventListener(
  myObjectChangeEventListener);
myFile.setRetrievalMode(DSMCCObject.FROM_STREAM_ONLY);

Some people have reported that it is necessary to load the object before changes can successfully be monitored, but this is not required by the MHP specification and is more likely a ‘feature’ of some implementations. If your application does not appear to be monitoring updates successfully, it may be worth trying to load the object before calling addObjectChangeEventListener().

Caching in detail

We have already mentioned that a receiver can choose to cache objects from the carousel in order to improve loading times. This can get to be a very complicated topic for developers, because this is one area that may dramatically improve the performance of an application, but it is also one of the areas where receivers vary most. There are many different caching strategies that could be applied, and there are many design decisions that manufacturers have to make:

How much data gets cached? Some receivers will cache the entire carousel if possible, some could choose to cache nothing. Most choose a middle ground, but that still offers very different behaviour.
How does the cache handle updates to objects in the carousel? Are these updates ignored, or are they reflected in the cache contents? Alternatively, do updated objects simply get flushed from the cache?

While the first of these is very much a matter for the receiver manufacturer because it only affects performance, MHP does give some guidance on the second issue because of the influence it has on the behaviour of applications. This gives broadcasters a baseline so that they know roughly how the receiver will behave in a given situation, and allows them to choose what kind of behaviour suits their application most. By setting a parameter in the object carousel, the broadcaster can choose one of three caching modes for a given piece of content:

Transparent caching means that the receiver can only use a cached copy of an object if it has checked the version number of the DSM-CC module containing the object within the past 0.5 seconds. If the version number has not been checked in the last 0.5 seconds, then the receiver must wait until it has re-checked the version number before it can use the cached copy of that object. This ensures that an application will always get the latest version of a file (0.5 seconds is considered to be within the bounds of error for a DTV broadcast, due to inherent uncertainties in the process of generating a transport stream).
Semi-transparent caching works in the same way, but allows the receiver to go up to 30 seconds between checking the version number of the module. This allows the application to retrieve a reasonably up-to-date version of the file without risking any delays.
With static caching, any updates to the cached object will be ignored. If an application wants the latest version of an object then it should explicitly request that the object is loaded from the stream only.

Transparent or semi-transparent caching may be active or passive. Active caching means that the receiver will monitor the version number of cached files to make sure that they are still up to date. In its most aggressive implementation, active caching uses a separate MPEG-2 section filter to monitor every cached module, which offers high performance but uses a large number of section filters (or conversely, only allows a small number of modules to be cached).

Passive caching allows the receiver to use much fewer section filters by simply not checking the version number of a file until it has to. With passive caching, the version number is only checked when something makes a request for that file – the application may have to wait a short while before the version number is known, but this delay is usually pretty short when compared to the time taken to load an object from the stream.

An aggressive approach to active caching is most likely to cause performance problems when it is used with transparent caching, because of the short time that a version number can go unchecked. For semi-transparent caching, it’s possible to get the same performance from a less aggressive caching strategy, where version numbers are only checked (for example) every five or ten seconds. This is really a combination of active and passive caching, and it allows the middleware to use a smaller number of section filters than would be necessary for the most aggressive approach. In practice, most implementations will use this combination of active and passive caching.

Persistent storage

An MHP receiver may support some persistent storage that applications can use. This may either be on a hard disk or in NVRAM for example, and so the amount of storage that is available may be limited. The org.dvb.io package provides some classes to support access to this persistent storage.

In general, an application can treat persistent storage as a normal filesystem. The root directory of the persistent filesystem is given by the system property dvb.persistent.root (which can be accessed with the getSystemProperty() method just like other system properties).

There are some restrictions on what the application can access in this filesystem. The directory <persistent_root>/<organization_id> is readable and writable by the application, as is the application’s ‘home’ directory and its subdirectories. Every application has a home direcory – this is automatically created by the receiver at the location <persistent_root>/<organization_id>/<application_id>. Other directories under <persistent_root>/<organization_id> or outside this part of the directory hierarchy may not be readable or writable by the application.

Whatever the underlying filesystem used by the receivers, there is a standard set of attributes that is supported by the persistent filesystem. These can be accessed using the org.dvb.io.FileAttributes class. The first of these attributes is the file owner. For the <persistent_root>/<organization_id> directory, the owner will always be the MHP receiver (effectively superuser). For the application’s ‘home’ directory and any files created by the application, the application shall be the owner.

File permissions largely follow the Unix model, with separate access permissions for owner (the application), group (the organization) and world. Each of these may have read or write access to the file. These file permissions are encapsulated by the org.dvb.io.FileAccessPermissions class. Unlike in Unix, where permissions are set using a numeric bit mask, they are simply set as a series boolean values in MHP. The setPermissions() method takes six boolean values as parameters; three pairs of read access/write access for world, organization and application respectively.

By default, only the application will have any access to its directory and the files it has created. As we’ve already seen, files in the <persistent_root>/<organization_id> directory will be readable and writable by every application from that organization.

There are two other file attributes that we have not yet discussed. These are both related to the small amount of persistent storage that is typically available on an MHP receiver. In an MHP receiver, there is no guarantee that persistent files will not be deleted by the receiver – this prevents one application from using all the persistent storage without any other application being given a chance to use it.

All files will have an expiration date, after which the content of the file are considered out of date and no longer useful to the application. Although the application can set this, ultimately the date that is actually used is set by the platform. This prevents an application from giving an expiration date far in the future avoiding this mechanism.

Typically, the platform will automatically delete files to reclaim space in persistent storage when necessary. Where possible, only files where the expiration date has passed will be deleted. It is possible, however, for files to be deleted before their expiration date if necessary.

The priority attribute is also used when the platform is managing available space in persistent storage. This attribute tells the platform about the relative importance of a given file – files marked with a high priority are less likely to be deleted automatically that files marked with a low priority.

Although it’s obviously possible for an application to set all files it creates to be high priority and with an expiration date 100 years in the future, this is not a good idea. After all, this is the information that allows the application to have a say in what files do get deleted (if any do). Ignoring this opportunity means that the receiver can delete files that may be important to you application as well as files that are unimportant.