OC Internals Documentation

Draft: 01/12/2009
Last Revised: 06/02/2009

Introduction

This document is an ongoing effort to describe the internal operation of oc.

Parsers

DDS/DAS Parser: dap.y

The dap.y parser parses DDSs, DATADDSs, and DASs. The supported syntax is essentially the same as the Ocapi parser, but the actions are different. Take a production of this form, for example.
nonterm: nonterm1 nonterm2 nonterm3 ;
The corresponding action calls an external procedure named for the left hand side and taking the values of the right side non-terminals as arguments.
{$$=nonterm(parsestate,$1,$2,$3);}
Note that this form of parsing action was requested by John Caron so that the same .y file could be used for C and Java parsers. In line with this, all non-terminals are defined to return a type of "Object", which is "void*" for C parsers and "Object" for Java parsers. The cost is the use of a lot of casting in the action procedures.

Note the extra "parsestate" argument. The parsers are constructed as reentrant and this argument contains the per-parser state information.

The bodies of the action procedures is defined in a separate file called "dapparselex.c". That file also contains the lexer required by the parser. Note that lex was not used because of the simplicity of the lexemes.

One of the issues that must be addressed by any bottom-up parser is handling the accumulation of sets of items (nodes, etc.)

The canonical way that this is handled in the oc parsers is to use the following form of production.

1  declarations:
2            /* empty */ {$$=declarations(parsestate,NULL,NULL);}
3          | declarations declaration {$$=declarations(parsestate,$1,$2);}
4          ;
The base case (line 2) action is called with NULL arguments to indicate the base case. The recursive case (line 3) is called with the values of the two right side non-terminals.

The corresponding action code is defined as follows.

1  Object
2  declarations(DAPparsestate* state, Object decls, Object decl)
3  {
4      Oclist* alist = (Oclist*)decls;
5      if(alist == NULL) alist = oclistnew();
6      else oclistpush(alist,(ocelem)decl);
7      return alist;
8  }
The base case is handled in line 5. It creates and returns a Sequence instance; a Sequence is a dynamically extendible array of arbitrary items (see below). The recursive case is in line 6, where it is assumed that the Sequence argument is defined and there is a decl object that should be inserted into the sequence.

This pattern, in various forms, is ubiquitous in the parsers.

Constraint Parser: ce.y

The ce.y parser parses DAP url projections (see DAPURL). There is code to also parse selections, but since that is not needed, it is commented out. This does not mean that selections are not used, only that the selection string is passed unmodified to the server.

Currently, there is no need for this parser, so it is included in the source tree, but is not used.

OC Node Tree

As with Ocapi, the dap parser produces a node tree defining the DDS (or DAS) structure. The node structure (struct OCnode) is defined in ocnode.h and has the following fields.

OCtype octype-Defines the general kind of node.
OCtype etype-Used for attribute nodes and primitive nodes to define the primitive type.
char* name-From the DDS.
char* fullname-Fully qualified name such as a.b.c.
OCnode* container-Parent node of this node.
OCnode* root-root node of the tree containing this node.
OCnode* datadds-The correlated DATA DDS node, if any */
OCdiminfo dim-Extra information about dimension nodes.
OCarrayinfo array-Extra information about nodes that have rank > 0.
OCattinfo att-Extra information about OC_Attribute nodes.
OCdapinfo dap-Extra information about node sizes */
Sequence* subnodes-(Sequence) The subnodes of this node.
Sequence* attributes-(Sequence)Any attributes associated with this node.

This particular structure is relatively similar to that of the Ocapi node, but with all the extra data storage information elided.

OCstate Management

The overarching concept in the API is that of a OCstate, which is an opaque identifier representing a DAP state; it is used to maintain persistent state about the state to a specific DAP server, as well as the various requests and responses between the client and the server.

A good analog is to the FILE object used by C standard IO. Like a FILE, an OCstate provides the context for some operation or object.

The state is used for a variety of purposes and is as a rule the first argument of any of the API procedures.

OCstate Structure

The OCstate structure contains the following fields.
CURL* curl-The handle to a CURL connection. Its lifetime is that of the OCstate structure.
OClist* tree-The set of root objects for previously fetched DAP requests. See OC Trees.
DAPURL url-URL for fetching data.
char* constraints-last set of specified constraints.
OCbytes* packet-buffer for storing fetched data.
OCcontent* contentlist-linked list of all create OCcontent objects.
char* code-Error code returned by the server.
char* message-Error message returned by the server.
long-HTTP result code from last HTTP request.

OCstate Trees

Every time that a DAP DDS, DAS, or DATADDS is fetched from a server, it is parsed and a tree of OCnode instances is constructed. The roots of these trees are kept in the OCstate and may be created by fetching and destroyed by appropriate interface procedures.

Associated with the root node of every tree is an instance of OCtree, which is used to store information about the fetch and the tree.

The OCtree structure contains the following fields.
OCdxd dxdclass-Enumeration instance: one of OCDAS, OCDDS or OCDATADDS.
char* constraint-The constraint string used when fetching the DAP object.
char* text-The text of the DAP object as received from the server.
OCnode* root-Cross link to the root node to which this OCtree instance is attached.
OCstat* state-Cross link to the state containing the root.
OClist* nodes-A list of all nodes in the tree rooted at root.

When the dxdclass is OCDATADDS, the the following additional fields are defined and used.
unsigned long bod-offset in the datadds packet to the beginning of the binary XDR data.
char* filename-name of the temporary file for holding datadds data.
FILE* file-FILE object for the temporary file.
unsigned long filesize-size of the temporary file.
XDR* xdrs-XDR handle for walking the temporary file.
OCmemdata* memdata-root of the compiled datadds packet.

Navigational Access to Fetched Data

A navigational interface has been defined that allows for simplified walking of the data dds packet data. The navigational interface has been modified multiple times, and the one described here is a variation on the one designed by Patrick West for the IDL client for OPeNDAP. In addition to the OCstate structure and the OCnode structure, the navigational interface defines an OCcontent structure.
OCmode mode-mode (see below).
OCstate* state-the state object to which this content is associated.
OCnode* node-the OCnode that templates the data of this content object.
unsigned int xdroffset-location of this content in the xdr packet.
size_t index-i'th field, record, or dimension.
int packed-track OC_String and OC_Byte specially.
struct OCcontent* next-link to next OCcontent object.
OCmemdata* memdata-pointer to the in-memory data (if datadds was compiled.

The OCcontent object represents a subset of the data (aka an instance) within the data part of a DATADDS response. The particular "kind" of subset is specified by a mode flag and corresponds to one of the following possible kinds of subsets.

The mapping between nodes and contents is one-to-many. That is, there often will be multiple data instances of a given node type in a DATADDS response. Consider the following example.

Dataset {
  Structure {
    int16 f11[2];
    float32 f12;
  } S1;
  Structure {
    int16 f21;
    float32 f22[2];
  } S2[3]
} D1;
If we have a data response with this DDS, then the following instances will exist.
ClassCountInstances
D11D1
S11D1.S1
f112D1.S1.f11[0]
D1.S1.f11[1]
f121D1.S1.f12
S23D1.S2[0]
D1.S2[1]
D1.S2[2]
f213D1.S2[0].f21
D1.S2[1].f21
D1.S2[2].f21
f226D1.S2[0].f22[0]
D1.S2[0].f22[1]
D1.S2[1].f22[0]
D1.S2[1].f22[1]
D1.S2[2].f22[0]
D1.S2[2].f22[1]

API

The API is best understood by reading the user's manual and following the code for procedures of interest. The API is defined in oc.[ch].

OC Data Compilation

The original Ocapi operated by converting the xdr packet to an in-memory structure attached to the node tree of the DDS (although it did not handle the full DDS). The base version of oc, however, leaves the data in the packet and extracts it as needed during the processing of OCcontent operations: in effect it does lazy extraction. The assumption is that if the data is to be accessed once, then lazy is the appropriate choice. If the same data is to be accessed N times (where N is > 1), then pre-compiling the xdr data into a memory structure may be more efficient for some values of N.

In order to experiment with this issue, the API

extern int oc_compile(OCstate*);
has been added. It does a one-time conversion of the xdr data to an in-memory structure. The OCcontent API operations (ocfieldcontent, etc.) will use the memory version if it is available.

Error Handling

Error handling in oc is somewhat different than in Ocapi, and follows mostly the netCDF model. That is, procedures return simple numeric error codes to indicate success (OC_NOERR) or failure (OC_EXXX). The current error codes are defined in oc.h, but it needs reorganization and extension.

One good thing about Ocapi was that it provided a mechanism for returning detailed error information strings. In order to keep something like that, oc has a log mechanism (oclog.[ch]). It can be used to dump extra error or warning info and it can be used to dump debug info (see the DEBUG macros in ocdebug.h).

DAPURL

Surprisingly, it appears that libcurl does not export any kind of URL parsing capability. Therefore, the DAPURL type was created to support this. It is defined in dapurl.[ch]. The DAPURL API is as follows.

OperationArgumentsReturnSemantics
dapurlparse 1. const char* url
2. DAPURL*
int error Parses an oc url string into its component parts. It returns 0 if fails, 1 otherwise. The component parts are as follows.
  • dapurl->urlThe url as passed in to dapurlparse.
  • dapurl->baseThe base part of the url (minus client parameters and constraints).
  • dapurl->projectionThe projection part of the url minus the leading '?'.
  • dapurl->selectionThe selection part of the url.
  • dapurl->paramsThe client parameters. They are parsed and stored in envv format where parameter names and values alternate. The whole list is terminated by a NULL value. It is assumed that the name part and the value part are never NULL. Rather, the empty string ("") is used to indicated no value.
dapurlclear 1. DAPURL* void Reclaim all the allocated space in a dapurl. The DAPURL instance itself is NOT reclaimed.
dapurllookup 1. DAPURL*
2. const char* name
const char* Search the client parameters for name. If not found return NULL, otherwise return the associated value.
dapurlreplace 1. DAPURL*
2. const char* name
3. const char* value
const char* Replace, insert, or delete a specified client parameter. Return 0 if the parameter does not already exist, and return 1 otherwise. If the value is NULL, then delete the parameter; otherwise, if the name is found, then replace the value, else insert the new name/value pair.
dapurlsetconstraints 1. DAPURL*
2. const char* constraints
void Replace the constraints (projection and selection) with the specied new constraints.
dapurlgeturl 1. DAPURL*
2. const char* prefix
3. const char* suffix
4. int withconstraints
char* Construct a URL from the pieces of the DAPURL. The order is prefix, baseurl, suffix, constraints.

Miscellaneous

The two datatypes OClist and OCbytes are used through out the code. They correspond closely in semantics to the Java Arraylist and Stringbuffer types, respectively. They are used to help encapsulate dynamically growing lists of objects or bytes to reduce certain kinds of errors.

The canonical code for non-destructive walking of a Sequence is as follows.

for(i=0;i<oclistlength(list);i++) {
    T* element = (T*)oclistget(list,i);
    ...
}

OCbytes provides two ways to access its internal buffer of characters. One is "ocbytescontents()", which returns a direct pointer to the buffer, and the other is "ocbytesdup()", which returns a malloc'd string containing the contents and null terminated.

Multi-Dimensional Array Handling

Within a data packet, the DAP protocol "linearizes" multi-dimensional arrays into a single dimension. The rule for converting a multi-dimensional array to a single dimensions is as follows.

Suppose we have the DDS field Int F[2][5][3];. There are obviously a total of 2 X 5 X 3 = 30 integers in F. Thus, these three dimensions will be reduced to a single dimension of size 30.

A particular point in the three dimensions, say [x][y][z], is reduced to a number in the range 0..29 by computing ((x*5)+y)*3+z. The corresponding general C code is as follows.

size_t
dimmap(int rank, size_t* indices, size_t* sizes)
{
    int i;
    size_t count = 0;
    for(i=0;i
In this code, the indices variable corresponds to the x,y, and z.
The sizes variable corresponds to the 2,5, and 3.

Change Log

Copyright

Copyright 2009, UCAR/Unidata and OPeNDAP, Inc.