horizontal rule in rainbow colours (2,1 kB)

Mirror directories by Email

back to the LINUX-page

This is work in progress, so do not expect any comleteness.

The Situation

There is a collection of hundreds of files, e.g. HTML-pages documenting software modules. Replicas of these pages are stored in several locations, e.g. branches of the organisation using the software. One of these locations is a centre where all changes to the software are to be coordinated.

At any location, these pages could be edited. These changes should automatically be propagated to the other locations (via the centre).

Unfortunately the locations are separated far enough that for cost reasons there is no permanent IP connectivity, so tools like rsync are not an option. Also, editing takes place at short notice (just invoke the editor), so cvs-like check-out/check-in sequences are too slow.

At each location there is a LINUX-system running the email service for the LAN users of that location.

The Idea: Email

The idea is, to utilize the existing email system to exchange the modified files.

In order to do this, several issues need to be addressed.

Design Goals

Conflict Detection

If a file was edited at different locations but simultaneously (close enough in time so that the update from location A didn't arrive at location B before B started to modify the file), then the system has to reliably detect the conflict. Resolution of the conflict will probably have to be done manually.

Directory Tree Capability

not only all files of one single directory should be mirrored, but also all files in all subdirectories (if the administrator so desires)

Automatic Receive Operation

If a file arrives by email, the the system should unpack and integrate it into the local replica (and perhaps rebroadcast it from the centre to the other locations) without user intervention.

Authentication (some)

Email sender addresses can easily be forged, so without additional measures some bad guy could easily send a forged update. So there sould be something like a digital signature to ensure that the files really come from an authorized file updater.

Confidentiality (some)

The replicated files are likely part of an organisation's intranet, so not everybody sholud be able to see the contents of files in transit. So there should be some form of encryption. This might well be combined with the authentication requirement.

Automatic Send Operation

When a file was edited, it should be sent to the other participating locations without any human having to remember the fact.

Low Volume of Traffic

Not every save of a (possibly large, hundreds of kB) file should trigger a new email message containing the entire file. The files are assumed to be edited by humans working by session. So e.g. there could be an hourly cron-job that checks whether any file has changed since the last transmission of that file. If the last change was less than, say, one hour ago, it is considered "under construction" and won't be transmitted now. If the last change was more than, say, one hour ago, it is considered "stable" and will be sent.

Additionally, compression should be used.

Alternative Ways of Transport

It should be possible to integrate other ways of transport (e.g. mailed tapes, ftp sessions) into the system. So there probably should be an "incoming" directory that is being scanned at suitable intervals.

Similarly, there should be an "outgoing" directory, that is being scanned at suitable intervals.

How to Detect Conflicts? (Try 1, June 12, 1999, 1600-1945 CEST)

check whether the received version conflicts with a local edit (this will mean trouble anyway, but it must be detected)

A conflict exists, exactly if version 1 (assumed to be consistent across all replicas) of some file becomes modified by location A resulting in version 2.A and by location B resulting in version 2.B. This happens exactly if B starts its modification before the change made by A (version 2.A) has arrived at and been integrated into the replica at B.

This can be detected not before version 2.A arrives at B. At that time, B has to detect, that the file was modifed after the previous bunch of files was sent from A to B. So B has to keep record of "the newest file that has arrived from A without causing a conflict" (if that's to be enough, it has to be assumed, that a conflict free arrival implies the complete arrival of all previously changed files).

The above assumptions seem too risky to rely on. So we really need a (kind of) database, constisting of one table containing these columns:

filename
location-ID
timestamp of the most recent modification of the file
checksum of the file (to detect modifications even if timestamps go crazy, e.g. by failing clocks)
timestamp of the most recent verification by [location-ID] of the correctness of this record (typically the time when that file was sent)

Such a record states: "as far as this location knows, at [verification timestamp] the newest copy of [filename] at location [location-ID] was the version dated [modification timestamp] with checksum [checksum]".

Primary key is the combination of filename and location-ID, i.e. for every combination of filename and location-ID there is exactly one record.

The records describing the local location are assumed to be consistent with the local files.

Every outgoing package containing a data file should also contain an excerpt of the database. At the very minimum this should be the record for filename and the sending location. A confirmation of every arrived file would be nice too, so the sender knows that the file has arrived. To avoid unnecessary network traffic, re-sends should be delayed until the confirmation can be assumed to have arrived under normal conditions.

At first, exclude the odd cases:

mod-X-at-Y: (6 possible results) some kind of failure, call the administrator.

These cases can occur:

modification-time(filename,A): file needs to be replaced at B, check for conflict (see below)
modification-time(filename,A) = modification-time(filename,B), checksums agree: file does not need replacement
any timestamp: Some clock is wrong. Call the administrator to check ours.

If a file arrives from location A at location B, there are several records to be looked at:

entry of B at B (just before arrival of the update, reflects previous local replica state at B)
entry of A at B (just before arrival of the update, reflects B's previous knowledge of A's state)
entry of A at A (as sent together with the update, will update B's knowldge of A's state)
B-at-A would be nice too, so B knows which version A was starting it's edit on.

Each of them contains 2 timestamps, so we have 3*2=6 timestamps (mod-B-at-B, ver-B-at-B, ...) to compare. There are n*(n-1)/2 comparisons, so 6*5/2 = 15 comparisons, resulting in at least one bit each. 2**15=32768 possibilities, which is too much, so at first exclude the obvious ones.

To ease the overview, I'll present each single comparison in a matrix. Each matrix element describes the case "row heading


	mod-B-at-B	ver-B-at-B	mod-A-at-B	ver-A-at-B	mod-A-at-A	ver-A-at-A
mod-B-at-B	=	clock trouble	recent edit at B, B should already have sent the file to A some time ago, A has not yet confirmed arrival of update from B. check A-at-A for possible conflict.	recent edit at B, B should already have sent the file to A some time ago, A has not yet confirmed arrival of update from B. check A-at-A for possible conflict. Also for =.	recent edit at B, B should already have sent the file to A some time ago, A has not yet confirmed arrival of update from B. A either resent an old version (compare A-at-A with A-at-B) or we have a conflict. Also for =.	recent edit at B, B should already have sent the file to A some time ago, A has not yet confirmed arrival of update from B. A either resent an old version (compare A-at-A with A-at-B) or we have a conflict. Also for =.
ver-B-at-B	should be always true (	=	should be always true (	should be always true (	should be always true (	should be always true (
mod-A-at-B	B knew for quite a time that A has a more recent copy, but the copy itself has not yet arrived, perhaps it's in the currently arriving package. Should not happen unless database contents travels on different paths than file contents.	clock trouble	=	clock trouble	A has certainly resent an old version, B already has a newer one. B already knew that A has something more recent than the arriving file.	A has certainly resent an old version, B already has a newer one. B already knew that A has something more recent than the arriving file.
ver-A-at-B	nobody knows... conflict can't be excluded due to propagation delay.	clock trouble	o.k., also =	=	no clarification.	A has certainly resent an old version, B already has a newer one. B already knew that A has something more recent than the arriving file.
mod-A-at-A	the arriving file is more current than the local one. Normal case. Certainly needs update, perhaps there is a conflict.	clock trouble	the arriving file is more current than the local one. Normal case. Certainly needs update, perhaps there is a conflict.	the arriving file is more current than the local one. Normal case. Certainly needs update, perhaps there is a conflict.	=	clock trouble
ver-A-at-A	nobody knows... conflict can't be excluded due to propagation delay.	clock trouble	nobody knows... conflict can't be excluded due to propagation delay.	nobody knows... conflict can't be excluded due to propagation delay.	o.k., also =	=

When Exactly Does A Conflict Happen? (try 2, Jun 12, 1999, 2000 CEST)

A conflict happens when (at least) two locations start modifying a file "simultaneously", i.e. B starts modifying the file BEFORE the modification made by A has arrived at B.

So to detect whether a conflict exists, one has to know for each modification, FROM what version the modification took place.

Formally

Version v1 is subjected to modifications m1 which result in version v2. In signs: v2=modif(v1,m1)

v2 is distributed by email. The process of distribution takes a considerable amount of time.

If there exist both v2=modif(v1,mA) and v3=modif(v1,mB) then there is a conflict.

If a file containing v2 arrives at some location B, B has to ensure that B's local copy still contains v1, before B adopts v2 as it's local copy.

How To Identify Versions?

Timestamp alone is not enough, because edits may happen at several locations simultaneously.

So use time (of last edit) and location of storage. This eliminates accidental treatment of different versions as being identical. But this also introduces treating identical version differently.

So use time of last edit and location of last edit. That's it!

To make it especially sure, include a checksum - that may elleviate the need for the location ID.

How to Identify Conflicts?

Every transmitted file transmitted from location A to location B needs to be accompanied by TWO version informations:

The version-ID FROM which A started the edit that resulted in this version
The version-ID of the transmitted version.

If the receiving location B can prove that it's local copy is identical to A's FROM-version, then B can be certain to not suffer a conflict.

The problem is to identify the FROM-version.

How To Identify the FROM-Version?

The modifying location A has to provide enough information that the receiving location B can be assured "A started modifying the same version that B still had as it's local copy".

The question is, when to sample. If a too late ID is given, then false conflict alarms may be triggered. If a too early ID is given, then real conflicts will go undetected.

To avoid false alarms or undetected conflicts, the FROM-version should be a version that is known to be identical to all local copies of all other locations (unless the other location has edited the file).

The perfect solution would be to have a database as described in Try 1, storing for each location the ID of the version that's supposed (better: known) to be stored there. Whenever this database indicates "all locations (including me) have the same version", then this ID is sampled and stored separately (not just as the "current local version"), perhaps as an additional pseudo-location "local FROM-version" a.k.a. "last globally synchronized version".

The Process

Database Of Versions

Each location needs to maintain information about the versions stored in all other locations with whom it directly exchanges files.

one table, primary key: (location, filename); one additional location "last globally synchonized". Columns:

filename
location ("this record describes the state of that location as perceived here")
Version-ID (timestamp and [last editor (location-ID) or checksum]) of the version that has been confirmed to be stored at [location]
time of last attempt to send the file from here to [location] (this column is not replicated)

Even in simple setups (one centre, branches exchange files only with the centre), the branches won't get along without an explicit database, because if they edit files, they have to determine the FROM-version, which needs some type of storage.

Sending Files

Detect What Files To Send

For each file and each remote location (branches may consider the centre as the only one), check whether the local version differs from the remote version (timestamp, checksum). If it differs, then check whether it should be (re-)sent ([time now] minus [time of last attempt]

Special case: if the remote version is more current than the local version, then you missed something. Time to ask your friendly system administrator.

avoid sending files that have arrived from other places

Before sending a file, there should be knowledge, which version of that file is already available at the recipient's location. Fortunately, that's already solved by the conflic detection database.

But if a file arrives for the first time, then it usually the confirmations of the other locations are not yet here, so the local system might think "o, I have a file that's newer than all the others have, so let's broadcast it".

Solution (probably not the only possible one): if a new version arrives from the outside, store the time of arrival into the "time of last transmission attempt"-field for every other location. This might add a lot of stability into the system, because if the direct route from the updater to one of the sites fails, other sites are likely to re-send the file.

This requires that, upon reception of every single file, all others pretty reliably get the information about this new version at this place. Failure to send the database update (a.k.a. confirmation) will result in the file being re-sent.

The timeouts should be pretty different for each participant, to avoid quasi-simultaneous resends by multiple locations. Ideally, not only the timeout, but also the difference of the timeouts between each pair of stations should be larger than the roundtrip delay of email messages (confirmations). In a "centre and branches" setup, the centre should get the shortest timeout.

place the files into the "outgoing" directory

yet to be defined:

file for the database
file for the FROM-version information
file for the database-updates

send the contents of the "outgoing" directory by mail

Receiving Files

unpack incoming mail and place the contents into the "incoming" directory

Conflict Detection

see above. The FROM-version arriving together with the updated file must be the same as the version that was stored here locally. Otherwise there is a conflict.

place the files to their destinations

check which files need to be rebroadcast to other participants

tell the world what version I have now

Very important.

Thought Chunk #2, 2000/03/12

Detection of conflicts vs. determination of where to send modifications (locally generated ones vs. ones that have arrived from remote locations) to.

Core Data Structure: The Version History (VH)

Each location has to maintain a list of objects (e.g. files) that are to be included in the replication logic.

For each object, a version history needs to be maintained. The version history per object consists of a reasonable (see below) number of past version information records.

Information per Version Information Record (VIR)

For each object, store a reasonable (see below) number of version records

OID: Object-ID (e.g. file name)
TIM: timestamp of last modification
LOC: location of last modification
CS: checksum (pretty tough, swapped characters [corrected typo] must result in a different checksum), e.g. MD5. The goal is, to detect modified vs. identical copies

Number of Version Records to Store per Object

at least enough to have one common (to most other locations) version covered

Local Database Update Check

Either triggered manually (after modifications to files have been made) or automatically in reasonable intervals (e.g. each night), or if object updates from other locations arrive.

Algorithm

Take a look at each object. Per object do:

compute the checksum
record the object modification time (e.g. file modification time as reported by the file system)
if either differs from the most current VIR (MCVIR), i.e. a modification has been detected:
- if MCVIR.LOC is local, then update the MCVIR
- else (i.e. if MCVIR.LOC is not local) the create (append) a new VIR, which then becomes the new MCVIR
- in both cases trigger a broadcast of the modified object

It might be useful to never update VIRs but to always create new ones. At least every broadcast version should be recorded permanently in the version history.

Broadcast of Modified Objects

Whenever it has been decided to broadcast an object, the following data has to be transmitted to the other participating locations

the modified object itself (perhaps only the diffs)
enough recent VIRs (ideally the entire VH) to be pretty sure that the version still stored in the remotest (having the most out of date copy) place is included. Failure to do so will result in false conflict alarms.

In the case of replicated web-page collections, this is likely to be implemented as an email message with 2 file attachments.

Propagation Path Ideas

The propagation path of updates is not prescribed by the logic presented here. Suggestion:

per location, list all "neighbouring" locations that should be immediately addressed (recipients of the data email)
if an update arrives from a remote location, check whether the arriving version is already stored locally
if the arriving version is to be considered new for the local location, then rebroadcast it to all neighbouring locations

Handling of Incoming Objects

Conflict Detection

For each incoming object, compare the remote (incoming, accompanying) VH with the locally stored VH using the following algorithm:

perform a local database update check (in order to update the local VH)

scan the remote VH, looking for a copy the local MCVIR (compare CS; better yet if TIM and LOC agree), recording the position (ordered by TIM, newest first) of the local MCVIR within the remote VH. (This is the simplest version; one could include other checks to detect past conflicts. See section on "More General Version Status Description")

If the local MCVIR occurs in the remote VH, and its position is the newest, then it's just a duplicate boradcast - discard the incoming update.

If the local MCVIR occurs in the remote VH, and its position is not the newest, then we've received a real good update - copy the incoming object and the incoming remote VH and rebroadcast it (in case other locations use us as a hub).

If the local MCVIR does not occur in the remote VH, the we definitely have a conflict (or a buggy remote VH handler) - see section on conflict handling.

Conflict Handling

If a conflict is detected, there have been multiple simultaneous updates on one object. Store both incoming and local stuff in a safe place and resolve the conflict manually, typically by giving one version priority above the other (thus discarding one location's changes) and reapply the discarded changes. To be able to do that, it is of help to store the diffs of local changes until we can be sure that no conflict has occurred.

More General Version Status Description

(Virtually) append a field "source" to each VIR, fill it with "L" (local) and "R" (remote) for the local and remote VH respectively.

Sort (by TIM, newest first) and merge both (local and remote) VHs into one set of VIRs.

Check for adjacent identical (as determined by CS, regardless of source) VIRs and compress those into one record with source:="C" (common).

Build a string of the sequence of the values of the source fields, so it's a sequence of C's, L's and R's.

Compress every consecutive sequence of identical characters into one single occurence of that character, e.g. any CC into C, any RR into R and any LL into L, until there remain no pairs of identical characters.

Now this is a quite precise description of the version status, thus let's call it "Version Status String" (VSS).

Every C represents a common, synchronized state. The hot parts are therefore before (newer than) the the first C, everything between adjacent C's, and after the last C. If there is no conflict, then every hot part should consist only of one single letter, representing the site where a modification was recorded and broadcast conflict free. If a hot part consists of different letters, then a conflict has happened, perhaps in the past. If there is a more current C, then this conflict has been solved.

Implementation

Abstractions

The algorithm described here may be applied to diverse situations, where several things can vary largely. This can be solved by object interfaces. These interfaces are either APIs, so that adaptation layers (functions) need to be written, or some method to specify the invocation by a configuration file, either by specifiying command line templates (flexible, but not very efficient if object counts grow into 5 digit regions) or be selecting from a number of precompiled possibilities.

Objects

can range from a few files (greatly varying in size) on a directory tree to single records (or even fields) of large (millions, if not billions of records) databases. Each of these types of objects may need a different type of handling, e.g. how to determine the current modification date, how to determine the checksum, how to send an object and how to import a received update.

The modification date of webpage can easily be found by a directory listing, while this is less obvious for database records, so there needs to be a kind of API where separate ways of doing things can plug in. The following functions are needed:

walk through objects: "first", "next", returning an object_id or Null (when the last object has been reached)
checksum(oid), modif_time(oid)
file_out(fd, oid): write a character string representation of the object referenced by oid to the file descriptor. Needs delimiter rules to be able to write multiple objects into one file. Suggestion: the read-function file_in must be able to determine the end of one object. file_out(Null) must write a special string, such that a corresponding file_in also returns null and also does not read beyond the last character of that representation of null. This enables to implement a logical EOF mark. Beware that the result of file_out may be too large to keep it in RAM (e.g. video movies).
file_in(fd): reads one object from the file descriptor fd into the local object store. Terminates after reading exactly the last character of that object's representation, without even trying to read any character beyond that. It must be permissible and efficient to call file_in repeatedly on the same fd for a large number of objects.

The data type of oid might vary too, but a relatively short (hard limit of 16k characters acceptable) kind of string is likely to be the standard - this should be flexible enough to accomodate filenames or database key values.

Transport Methods

A single web page file can comfortably be transmitted as a separate email message, while this is completely inefficient for a single database field.

Transport might be by various means, e.g.

TCP connection - hmm, this means, I'm trying to replace rsync, rather unlikely given the purpose and starting points of these thoughts.
Email, by various programs
physical transfer of file-carrying media, with different properties regarding size per medium (e.g. floppy vs. DAT), rewritability (an issue with CD-R's) and speed.
file exchange by common directory (e.g. a samba mount on a LAN)
file exchange by network protocol (e.g. ftp, http)

Additionally, a variety of coding methods may be used:

character representation: there are enough languages (and binary objects like images) that use character sets exceeding the 32..127 ASCII-range - I've had enough struggles with different representations of German umlauts, e.g. in Unix vs. DOS vs. Windows. The HTML character entities like ä for � may be a way to go for human languages, while MIME-encoding may be useful for binary application data.
compression: zip vs. gzip vs. lha vs. whatever
error checking: checksums or the like are needed if the underlying transport mechanism can's guarantee for stable contents (e.g. demagnetized floppys)

Security

Especially when transmitting through insecure channels (which e-mail certainly is), authentication is probably required, to avoid unauthorized intoduction of data by the "bad guys". There are many algorithms to achieve this, one way to go might be to attach PGP (or GnuPG) signatures to each transmitted file.

If the data exchanged is confidential (i.e. the contents must not be readable to unathorized people, e.g. in a distributed organization's intranet), encryption is needed. This needs flexibility on encryption algorithms and key management. The first thing I think of is PGP (or GnuPG), but there are many others...

Versions Information Storage

Depending on the number of objects to be managed and the available systems (not everybody, especially on pocket computers, has a database server around), the version information may be stored by different techniques. I imaginable stuff like:

text files, 1 line per version record, fields separated by colons, read into RAM - useful if the number of objects is limited (say not much above 1000), might be o.k. for small to medium sized intranets parts of which are to be replicated on PDAs.
database files, accessed by something like the gdbm-API
database servers like PostgreSQL or Oracle

Thus an API is needed to abtract this layer:

get_vh(oid) -
put_vh(oid,vh_list): stores a (probably modified) version list for object oid

Checksum Algorithms

MD5 is good, but not the only one. The checksum algorithm must be pretty good, especially it must be able to the the existence of a difference between an original file and a copy that has just 2 characters swapped (tpyo corrected).

And then there is the question of how to call the checksum computation. An API would just contain one function:

checksum(oid)

This is identical to a portion of the object store API.

Another theoretical option would be, to leave the checksum computation out of object store API and to compute the checksum by the replication logic, using a standard algorithm on the result of file_out. But this can be inefficient for large object counts (always open a file...). If the inefficency would be coutered by storing the object representation in RAM, trouble may turn up when real large objects (e.g. video movies) appear that won't fit into RAM.

Thought Chunk #3, 2001/03/24

Web Search for existing related solutions

Results of google search:

Using MySQL's Built-In Replication To Maximize Availability (article on PHPbuilder.com) - MySQL seems to have replication built in - but only with an online connection. PostgreSQL and other database products have similar features or thoughts.
Replication and Caching Position Statement of the W3C - very general, geared towards global load balancing
Web Distribution Systems : Caching and Replication - from Ohio State University, good overview of online mechanisms, protocols, and products for reducing network load for large collections of web pages.
LinkPro makes commercial software for replication between file servers
Replication in Ficus Distributed File Systems - by Gerald Popek and John Heidemann from UCLA. From the abtract: Ficus is a replicated general filing environment for Unix intended to scale to very large (nationwide) networks. The system employs an optimistic ``one copy availability'' model in which conflicting updates to the file system's directory information are automatically reconciled, while conflicting file updates are reliably detected and reported. The system architecture is based on a stackable layers methodology which permits a high degree of modularity and extensibility of file system services. Comes pretty close to my goal. Popek's assumption of frequently interrupted network connections and optimistic update (every location can update without prior need of connection to the master, conflicts are detected later) are quite exactly mine. Assumes quick (online) back-and-forth communication during connected stages, so email is something different. Constructs an entire file system on kernel level, the distributedness of which may be hidden to the user. My approach assumes a high level replicator process acting on an underlying file system.
Resolving File Conflicts in the Ficus File System - another paper from the same series, talking a lot about various categories of conflicts and data dependend methods to resolve them. Uses mostly perl. Provides conflict statistics - disconnected users create a lot more conflicts, but many of those are automatically solvable. Uses version vectors, looking similar to my version history lists.
Primarily Disconnected Operation: Experiences with Ficus - in the same series as above. Multi-Site replication done in a ring, because "any-to-any" would cause O(n*n) messages per update - too much. Don't exchange all files' versions, but only those with a recent modification timestamp; also recording times of last successful reconciliation. They saw the problems in high latency connections whe using many back-and-forth messages - of which email based replication is an extreme case. Requires 45 minutes of modem time for 63 MB home directory plus 50 MB of mostly static system files. Ficus uses peer-to-peer approach vs. Coda project which is expressedly client-server
rdist: Unix utility, needs intermittent online connection, transport can be done by any rsh compatible means, e.g. ssh. No attempt to resolve conflicts, assumes client side to be the master. Unix only.
rsync: rcp replacement, attempts to transport only the differences of the file contents. The user has to specify source and destination, so there is no conflict checking. rsync may be one (attractive) choice as a transport mechanism. Unix only

Choice Of Implementation Language

Goals

lightweight, no heavy burdens on hardware
availability on many platforms, including Linux, Win32, MacOS, EOPC (e.g. Psion handheld computers)
easy configuration for various purposes, e.g. different transport systems (floppy disk, email, narrowband pay-per-minute TCP/IP, broadband pay-per-minute TCP/IP, permanent pay-per-MB TCP/IP, permanent free TCP/IP), different types of application data (web pages, video files, database records), different types of version history database (flat files, various database servers, ...), ideally at runtime, but easy at compile/setup-time might be o.k. too.

How to plug together the various objects?

"include" on interpreter level?

Java?

What language to use for implementation? Java? Python? Perl? C?

Python

easy programming, good modularity
avaliable on some platforms: Unix, Win32, Mac, EPOC (port in development), even partially for Palm OS
pickling enables a simple persistent object store for the version history
lots of modules available to enable various databases

Java

widespread
not very uniform implementations
somewhat more complicated programming that Python
very slow processing
reasonable modularity

C

very fast execution
difficult programming
very difficult modularization at run time
easy modularization at compile time

Result of Thoughts: Python

The ease of programming, wide availability, good modularity and relatively small footprint lead to the decision to use Python as the language of choice for implementing the system.

APIs

To enable a variety of combinations of application data, storage methods and communication methods, this architecture of modules is proposed:

                      Replication Controller
                                !
                                !
   +-----------------+----------+---------+-------------+
   !                 !                    !             !
   !                 !                    !             !
Administrative    Application Data     Message       Message
Database API      Local Storage API    Reception     Transmission
(ADBAPI)          (ADLSAPI)            API (MRAPI)   API (MTAPI)

Each module is likely to consist of more than one Python class.

The selection of modules is done by naming the corresponding python source code file in the controller's configuration file. The chosen modules are being imported upon startup of the controller. This can be done e.g. by
modname="mod2" exec("from "+modname+" import *")

For efficiency reasons, the data structures handled by the various APIs are likely to be specified in terms of lists and dictionaries, rather than classes.

Representation of Application Data

To enable a richer set of common functions perfomed by the controller, it is preferable to use a string representation of the entire content of application data, not just an implementation dependent pointer (e.g. a filename). This enables e.g. checksumming or encryption algorithms to be provided by the controller, thus reducing the effort of implementing this kind of functions in the APIs.

This means, that the MRAPI and MTAPI have to be able to produce and accept such a string representation as a parameter for retrieving/constructing a message.

The drawback of this design choice is, that single application objects are assumed to be small enough, that 2 or 3 copies of the largest one fit into the python heap and thus into (virtual) RAM. This can be safely assumed for web page files and database rows. Multimedia files, especially long audio streams and video clips, are likely to break this assumption with current hardware - but files like this probably don't get transmitted through current email systems.

To cater for the case of databases, the complete list of application objects is assumed to be too large to fit into the python heap. The ADLSAPI has to provide methods to make sure to access all of them, e.g. during a consistency check.

To also cater for large collections, the list of all batched messages is assumed to not fit into the python heap. The MRAPI has to provide functions to retrieve messages one by one.

Data Structures

Each application object is identified by an object_id of type string. The same string is to be used in all APIs.

All locations (that can possibly modify objects) are identified by a location_id of type string.

All timestamps are of type string: YYYYMMDDHHMMSS, e.g. 20010324204759

Versioning Information for each application object

is composed as follows:

version_database ::= object_version_history+
object_version_history ::= object_id version_list
version_list ::= version_info+
version_info ::=
- timestamp_of_this_measurement (the checksum is valid at this time, e.g. least as long as this, it may be valid for a much longer time)
- location_id_of_this_measurement
- checksum of object contents (at the time given in timestamp_of_this_measurement)
- timestamp_of_object_modification (the object contents has definitely not changed since then, it may not have changed for a much longer period of time)
- location_id_of_object_modification
- administrative_modification_data (arbitrary sting that could contain data like the modificator's user name and some user specified comment about the version)

Whenever the version list of a single object is retrieved or stored, this is done as a python list, composed of

one list item per version info
ordered by timestamp_of_this_measurement

each list item is stored/retrieved as a python dictionary:

timestamp_of_this_measurement: string as described above
location_id_of_this_measurement: string as described above
checksum: string as described above
timestamp_of_object_modification: string as described above
location_id_of_object_modification: string as described above
administrative_modification_data: string as described above

Messages

msgs ::= msg*
msg ::= msg_admin_info (version_info_msg ! data_update_msg ! conflict_notification_msg)
msg_admin_info ::= msg_ type_id msg_from_location_id msg_to_location_id
msg_type_id ::= "version" ! "data" ! "conflict"
version_info_message ::= object_version_history (not necessarily complete)
data_update_message ::= object_version_history (newest version is the transmitted one) application_object_contents
conflict_notification_msg ::= object_version_history

A source_location_id and an object_id are not needed, because the version info does contain this information.

A msg_id is not needed, because an incoming message is presented only once anyway.

Whenever a message is stored or retrieved, this is done as a nested python dictionary (one separate dictionary for each message), composed exactly as described above.

Locations

location ::= location_id network_address

one dictionary per location

Configuration File

all configuration data is read into an object of type cfg_handle at the time when the controller is started. It contains this information:

adbapi: module name of the adbapi
implementation specific configuration information for the adbapi, e.g. name of db-host, username, password (should be encrypted...)
adlsapi: module name of the adlsapi
implementation specific configuration information for the adlsapi, e.g. name of db-host, username, password (should be encrypted...)
mrapi: module name of the mrapi
implementation specific configuration information for the mrapi, e.g. name of the POP3 host or the "incoming" directory

Administrative Database API (ADBAPI)

class adb, containing these methods:

adb(cfg_handle) (Constructor)

initializes internal data structures, e.g. open a database server connection, based on the adbapi section of cfg_handle. File or database handles, including cfg_handle, that may be required by the implementation can be stored in any instance variables of class adb.

adb_get_my_location()

returns the location_id of this process. Needed to feed the local version history.

adb_get_location_list()

returns a list of all location_id s.

adb_get_location(lid)

returns all stored information about the given location

adb_get_version_history(oid)

returns the entire version history of the (single) object specified by oid. Returns None if oid is not stored, which is a relatively normal case if e.g. a newly arrived create message is checked against the local database.

adb_put_version_history(oid,vh)

permanently stores the (probably enhanced/modified) entire version history of the object specified by oid

adb_get_first_oid()

return the first object id stored in the database or None if there aren't any. Has to cooperate with adb_get_next_oid().

adb_get_next_oid()

return the next object id stored in the database or None if all objects' records have been visited. Has to cooperate with adb_get_first_oid() in a way that this loop will touch every stored application object's version history exactly once:

aoid = adb_get_first_oid()
while aoid:
    vh = adb_get_version_history(aoid)
    check_and_modify_in_what_ever_way_needed(aoid,vh)
    adb_put_version_history(aoid,vh)
    aoid = adb_get_next_oid()

If adb_get/put_version_history are inefficient (e.g. because the implementation contains no equivalent of an index) for arbitrary sequences of oids touched, then adb_get_next_oid should be implemented in such a way that the efficiency of the walk is reasonably well.

adb_close()

closes the internal data structures. Depending on the implementation, a call to adb_close may or may not be essential to save modified data to permanent (e.g. disk) storage. Failure to call adb_close may result in loss of vital versioning information.

Application Data Local Storage API (ADLSAPI)

class adls, containing these methods:

adls(cfg_handle) (Constructor)

initializes internal data structures, e.g. open database connections, based on the adlsapi section of cfg_handle.

adls_get_timestamp(oid)

returns the timestamp of the last modification of the application object, if the underlying implementation supports it, e.g. a Linux file system. Returns None if oid is not present as application object, which is a relatively normal case e.g. if a newly created object arrives as a message.

adls_get_contents(oid)

returns the contents of the specified application object as a (possibly very long) string. Returns None if oid is not present as application object, which is a relatively normal case e.g. if a newly created object arrives as a message.

adls_put_contents(oid,cont,timestamp)

sets the contents of the specified apllication object to be the (possibly very long) string cont. Sets the value returned by future adls_get_timestamp(oid).

If oid is not yet present as an application object, it is silently created.

adls_get_first_oid()

return the first object id stored in the application or None if there aren't any. Has to cooperate with adls_get_next_oid().

adls_get_next_oid()

return the next object id stored in the database or None if all application objects have been visited. Has to cooperate with adls_get_first_oid() in a way that this loop will touch every application object exactly once:

aoid = adls_get_first_oid()
while aoid:
    check_and_modify_in_what_ever_way_needed(aoid)
    aoid = adls_get_next_oid()

If adb_get/put_version_history are inefficient (e.g. because the implementation contains no equivalent of an index) for arbitrary sequences of oids touched, then adls_get_next_oid should be implemented in such a way that the efficiency of the walk is reasonably well.

adls_close()

closes the internal data structures. Depending on the implementation, a call to adls_close may or may not be essential to save modified contents to permanent (e.g. disk) storage. Failure to call adls_close may result in loss of vital application object contents.

Message Reception API (MRAPI)

Fully responsible for authentication/authorization.

class mr, containing these methods:

mr_open(cfg_handle): initializes internal data structures, e.g. open a POP3 server connection, based on the mrapi section of cfg_handle. File or database handles, including cfg_handle, that may be required by the implementation can be stored in any instance variables of class mr.
mr_get_message(): returns the next unread message from the incoming queue or None if there are no more unread messages. A call to mr_get_message is likely to permanently delete the message after it has been returned.
mr_close(): closes the internal data structures. Depending on the implementation, a call to mr_close may or may not cause read messages to be permanently deleted. Failure to call mr_close is likeley to result in duplication of incoming messages.

Message Transmission API (MTAPI)

class mt, containing these methods:

mt_open(cfg_handle): initializes internal data structures, e.g. open a database file, based on the mtapi section of cfg_handle. File or database handles, including cfg_handle, that may be required by the implementation can be stored in any instance variables of class mt.
mt_put_messagr(message): sends an entire message. Addresses are being passed as part of the message, so no separate parameter is needed
mt_close(): closes the internal data structures. Depending on the implementation, a call to mt_close may or may not be required to actually send the messages. Failure to call mt_close is likeley to result in messages being discarded without warning.

Solved Issues

can strings be long enough to hold several megabytes of an application object? can strings contain (even extended sequences of) binary null characters? Yes! I successfully tried 40 MB

Open Issues

how to handle conflicts? which API should be used to inform whom of the existence (and the precise location) of trouble? Idea: MRAPI/MTAPI get functions to notify abbout conflicts.
precise structure of config-file and associated global variable(s)
checksum algorithm and representation
where should locations be administered? is there a need to do that in the controller? probably, at least if

horizontal rule in rainbow colours (2,1 kB)

Last updated: 05.05.2007 17:43:12 Martin Stut, email: email address as image

, Marburg, Germany
URL: http://www.stut.de/linuxmailmirror.html