/* Copyright 2009, UCAR/Unidata and OPeNDAP, Inc. See the COPYRIGHT file for more information. */

OC Internals Documentation

Draft: 01/12/2008
Last Revised: 01/31/2008

Introduction

This document is an ongoing effort to describe the internal operation of oc.

This code was produced by pulling it out of the DRNO code with which it had become entangled (bad design choice). As such, there are a number of netCDF style design elements still within the oc code.

Parsers

DDS/DAS Parser: dap.y

The dap.y parser parses DDSs and DASs. The supported syntax is essentially the same as the Ocapi parser, but the actions are different. Take a production of this form, for example.
nonterm: nonterm1 nonterm2 nonterm3 ;
The corresponding action calls an external procedure named for the left hand side and taking the values of the right side non-terminals as arguments.
{$$=nonterm(parsestate,$1,$2,$3);}
Note that this form of parsing action was requested by John Caron so that the same .y file could be used for C and Java parsers. In line with this, all non-terminals are defined to return a type of "Object", which is "void*" for C parsers and "Object" for Java parsers. The cost is the use of a lot of casting in the action procedures.

Note the extra "parsestate" argument. The parsers are constructed as reentrant and this argument contains the per-parser state information.

The bodies of the action procedures is defined in a separate file called "dapparselex.c". That file also contains the lexer required by the parser. Note that lex was not used because of the simplicity of the lexer.

One of the issues that must be addressed by any bottom-up parser is handling the accumulation of sets of items (nodes, etc.)

The canonical way that this is handled in the oc parsers is to use the following form of production.

1  declarations:
2            /* empty */ {$$=declarations(parsestate,NULL,NULL);}
3          | declarations declaration {$$=declarations(parsestate,$1,$2);}
4          ;
The base case (line 2) action is called with NULL arguments to indicate the base case. The recursive case (line 3) is called with the values of the two right side non-terminals.

The corresponding action code is defined as follows.

1  Object
2  declarations(DAPparsestate* state, Object decls, Object decl)
3  {
4      Oclist* alist = (Oclist*)decls;
5      if(alist == NULL) alist = oclistnew();
6      else oclistpush(alist,(ocelem)decl);
7      return alist;
8  }
The base case is handled in line 5. It creates and returns a Sequence instance; a Sequence is a dynamically extendible array of arbitrary items (see below). The recursive case is in line 6, where it is assumed that the Sequence argument is defined and there is a decl object that should be inserted into the sequence.

This pattern, in various forms, is ubiquitous in the parsers.

Constraint Parser: ce.y

The ce.y parser parses DAP url projections (see DAPURL). There is code to also parse selections, but since that is not needed, it is commented out. This does not mean that selections are not used, only that the selection string is passed unmodified to the server.

OC Node Tree

As with Ocapi, the dap parser produces a node tree defining the DDS (or DAS) structure. The node structure (struct OCnode) is defined in ocnode.h and has the following fields.

OCtype octype-Defines the kind of node.
OCtype etype-Used for attribute nodes and primitive nodes to define the primitive type.
char* name-From the DDS.
char* fullname-Fully qualified name such as a.b.c.
int active-True if this node participates in the datadds data packet; currently not used.
OCnode* container-Parent node of this node.
Diminfo dim-Extra information about dimension nodes.
Arrayinfo array-Extra information about nodes that have rank > 0.
Attinfo att-Extra information about attribute nodes.
Sequence* subnodes-(Sequence) The subnodes of this node.
Sequence* attributes-(Sequence)Any attributes associated with this node.
void* public-Place for users to attach arbitary info to the node instances.

This particular structure is relatively similar to that of the Ocapi node, but with all the extra data storage information elided.

OC "API"

Currently, of course, there is no oc or Ocapi API as such; it is a work under construction. However, major parts of a complete API do exist.

Connection Management

The overarching concept in the API is that of a Connection, which is an opaque identifier representing a DAP connection; it is used to maintain persistent state about the connection to a specific DAP server, as well as the various requests and responses between the client and the server.

The connection is used for a variety of purposes and is as a rule the first argument of any of the API procedures. The basic connection API is as follows.

OperationArgumentsOutput(s)Semantics
oc_openN.AConnection Return a reference to an new Connection.
oc_close1. ConnectionErrno Close a connection and reclaim any associated resources.
oc_fetchdds 1. Connection
2. char* url
Errno Fetch a DDS from the DAP server. The specific DDS is determined by the url, which should itself not end in ".dds". The returned DDS is parsed and the rootnode of the parse is stored in the Connection state.
oc_fetchdas 1. Connection
2. char* url
Errno Fetch a DAS from the DAP server. The specific DAS is determined by the url, which should itself not end in ".das". The returned DAS is parsed and the rootnode of the parse is stored in the Connection state.
oc_fetchdatadds 1. Connection
2. char* url
Errno Fetch a DATADDS from the DAP server. The specific DATADDS is determined by the url, which should itself not end in ".dods". It may include constraints, however. The returned DDS is parsed and the rootnode of the parse is stored in the Connection state. The associated data, referred to here as the "data packet" is also captured and stored in a temporary file with a random name. For security reasons, the file must not already exist, and only the creatorhas read/write permission to the file.
oc_getdds1. ConnectionNodeid Return the root node of the DDS. If oc_fetchdds has not been called or if the DDS is malformed, then the root will not exist and the value NULL will be returned.
oc_getdas1. ConnectionNodeid Return the root node of the DAS. If oc_fetchdas has not been called or if the DAS is malformed, then the root will not exist and NULL will be returned.
oc_getdatadds1. ConnectionNodeid Return the root node of the DATADDS. If oc_fetchdatadds has not been called or if the DATADDS is malformed, then the root will not exist and NULL will be returned.

Navigational API

A navigational interface has been defined that allows for simplified walking of the data dds packet data. The navigational interface has been modified multiple times, and the one described here is a variation on the one designed by Patrick West for the IDL client for OPeNDAP. The basic concepts of this interface are as follows.

The mapping between nodes and contents is one-to-many. That is, there often will be multiple occurrences of a given node type in a DATADDS response. Consider the following example.

Dataset {
  Structure {
    int16 f11[2];
    float32 f12;
  } S1;
  Structure {
    int16 f21;
    float32 f22[2];
  } S2[3]
} D1;
If we have a data response with this DDS, then the following instances will exist.
ClassCountInstances
D11D1
S11D1.S1
f112D1.S1.f11[0]
D1.S1.f11[1]
f121D1.S1.f12
S23D1.S2[0]
D1.S2[1]
D1.S2[2]
f213D1.S2[0].f21
D1.S2[1].f21
D1.S2[2].f21
f226D1.S2[0].f22[0]
D1.S2[0].f22[1]
D1.S2[1].f22[0]
D1.S2[1].f22[1]
D1.S2[2].f22[0]
D1.S2[2].f22[1]

Navigating Content

The goal is to allow the user to navigate to all of the instances contained in a given DATADDS data packet and, in certain cases, extract the instance as usable data.

The basic API is as follows.

OperationArgumentsOutput(s)Semantics
oc_newcontentN.A.Content Return a reference to an empty Content object.
oc_freecontent 1. Connection
2. Content
Errno Destroy a reference to a Content object and release any associated resources.
oc_clonecontent 1. Connection
2. Content
Errno Create a new Content object with the same values as the input Content object.
oc_rootcontent 1. Connection
2. Content
Errno Given a Content object, modify the content object to refer to the whole dataset; this corresponds to all of the data that was returned in response to a DATADDS request.
oc_dimcontent 1. Connection
2. Content
3. Content
4. size_t i
Errno Given a reference an existing content (arg 1) that is in Dimmode, modify the given content object (arg 2) to refer to the ith instance of the dimension. See the section on handling multi-dimensional arrays to see how a multi-dimensional object is reduced to a single integer index.
oc_recordcontent 1. Connection
2. Content
3. Content
4. size_t i
Errno Given a reference an existing content (arg 1) that is in Recordmode, modify the given content object (arg 2) to refer to the ith record of the sequence instance.
oc_fieldcontent 1. Connection
2. Content
3. Content
4. size_t 
Errno Given a reference an existing (parent) content (arg 1) that is in Fieldmode, modify the given content object (arg 2) to refer to the ith field of the parent content.
oc_getcontent 1. Connection
2. Content
3. void* memory
4. size_t memsize
5. size_t start
6. size_t count
Errno Given a reference an existing, defined, dimensioned content object, extract some subset of the data and store it in the space defined by the memory argument. It is assumed that the current content references is in Dimmode, which means that it was reached using oc_fieldcontent(). The subset of count items beginning at start are extracted into the memory argument. This routine will also work for scalars, but the count must be one and the start must be zero.
oc_recordcount 1. Connection
2. Content
size_t Given a reference an existing content (arg 1) that is in Recordmode, return the number of records associated with this content. Note that this can be an expensive operation because some part of the data must be processed to count the number of records.
oc_dimcount 1. Connection
2. Content
size_t Given a reference an existing content (arg 1) that is in Dimmode, return the number of actual elements in the xdr packet. Because of projections, this may differ from the count determined by combining multi-dimensional arrays.
oc_fieldcount 1. Connection
2. Content
size_t Given a reference an existing content (arg 1) that is in Fieldmode, return the number of fields.

Over time, the list of API procedures is likely to grow, so the above may be somewhat out-of-date. The file "occontent.h" should contain the definitive set of procedures.

Of course, it is possible to define a number of useful procedures on top of these basic operations. For example, it might be useful to define a variant of oc_dimcontent that takes a multiple dimensions and returns the associated content. In effect it would compute the multi-dimensional conversion algorithm for the user.

One other note about Content objects. The reason that there are explicit create and destroy operations is to allow/force the user to control the number of created Content objects and to reuse previously created Content objects. If the API created a new object for every call to, say, oc_dimcontent, then there would be an explosion of Content objects equal to the size of the dimension. There would be no way to reclaim them either because it is impossible to know which are still actively in use.

OC Data Compilation

The original Ocapi operated by converting the xdr packet to an in-memory structure attached to the node tree of the DDS (although it did not handle the full DDS). The base version of oc, however, leaves the data in the packet and extracts it as needed during the processing of OCcontent operations: in effect it does lazy extraction. The assumption is that if the data is to be accessed once, then lazy is the appropriate choice. If the same data is to be accessed N times (where N is > 1), then pre-compiling the xdr data into a memory structure may be more efficient for some values of N.

In order to experiment with this issue, the API

extern int oc_compile(struct OCconnection*);
has been added. It does a one-time conversion of the xdr data to an in-memory structure. The OCcontent API operations (oc_dimcontent, etc.) will use the memory version if it is available.

Error Handling

Error handling in oc is somewhat different than in Ocapi, and follows mostly the netcdf model. That is, procedures return simple numeric error codes to indicate success (OC_NOERR) or failure (OC_EXXX). The current error codes are defined in oc.h, but it needs reorganization.

One good thing about Ocapi was it provided a mechanism for returning detailed error information strings. In order to keep something like that, oc has a log mechanism (oclog.[ch]). It can be used to dump extra error or warning info and it can be used to dump debug info (see the DEBUG macros in ocdebug.h).

DAPURL

Surprisingly, it appears that libcurl does not export any kind of URL parsing capability. Therefore, the DAPURL type was created to support this. It is defined in dapurl.[ch]. The DAPURL API is as follows.

OperationArgumentsOutput(s)Semantics
dapurlparse1. const char* url
2. DAPURL*
int error Parses an oc url string into its component parts. It returns 0 if fails, 1 otherwise. The component parts are as follows.
  • dapurl->urlThe base part of the url (minus client parameters and constraints).
  • dapurl->projectionThe projection part of the url; this can be given to ce.y to parse.
  • dapurl->selectionThe selection part of the url. parse.
  • dapurl->paramsThe client parameters. They are parsed and stored in envv format where parameter names and values alternate. The whole list is terminated by a NULL value. It is assumed that the name part and the value part are never NULL. Rather, the empty string ("") is used to indicated no value.
dapurlclear1. DAPURL*void Reclaim all the allocated space in a dapurl. The DAPURL instance itself is NOT reclaimed.
dapurllookup1. DAPURL*
2. const char* name
const char* Search the client parameters for name. If not found return NULL, otherwise return the associated value.
dapurlreplace1. DAPURL*
2. const char* name
3. const char* value
const char* Replace, insert, or delete a specified client parameter. Return 0 if the parameter does not already exist, and return 1 otherwise. If the value is NULL, then delete the parameter; otherwise, if the name is found, then replace the value, else insert the new name/value pair.

Miscellaneous

The two datatypes Sequence and Bytebuffer are used through out the code. They correspond closely in semantics to the Java Arraylist and Stringbuffer types, respectively. They are used to help encapsulate dynamically growing lists of objects or bytes to reduce certain kinds of errors.

The canonical code for non-destructive walking of a Sequence is as follows.

for(i=0;i<sqLength(seq);i++) {
    T* element = (T*)sqGet(seq,i);
    ...
}

Bytebuffer provides two ways to access its internal buffer of characters. One is "bbContents()", which returns a direct pointer to the buffer, and the other is "bbDup()", which returns a malloc'd string containing the contents and null terminated.

Multi-Dimensional Array Handling

Within a data packet, the DAP protocol "linearizes" multi-dimensional arrays into a single dimension. The rule for converting a multi-dimensional array to a single dimensions is as follows.

Suppose we have the DDS field Int F[2][5][3];. There are obviously a total of 2 X 5 X 3 = 30 integers in F. Thus, these three dimensions will be reduced to a single dimension of size 30.

A particular point in the three dimensions, say [x][y][z], is reduced to a number in the range 0..29 by computing ((x*5)+y)*3+z. The corresponding general C code is as follows.

size_t
dimmap(int rank, size_t* indices, size_t* sizes)
{
    int i;
    size_t count = 0;
    for(i=0;i 0) count *= sizes[i];
	count += indices[i];
    }
    return count;
}
In this code, the indices variable corresponds to the x,y, and z. The sizes variable corresponds to the 2,5, and 3.

Change Log