back to the LINUX-page
This is work in progress, so do not expect any comleteness.
There is a collection of hundreds of files, e.g. HTML-pages documenting software modules. Replicas of these pages are stored in several locations, e.g. branches of the organisation using the software. One of these locations is a centre where all changes to the software are to be coordinated.
At any location, these pages could be edited. These changes should automatically be propagated to the other locations (via the centre).
Unfortunately the locations are separated far enough that for cost reasons there is no permanent IP connectivity, so tools like rsync are not an option. Also, editing takes place at short notice (just invoke the editor), so cvs-like check-out/check-in sequences are too slow.
At each location there is a LINUX-system running the email service for the LAN users of that location.
The idea is, to utilize the existing email system to exchange the modified files.
In order to do this, several issues need to be addressed.
If a file was edited at different locations but simultaneously (close enough in time so that the update from location A didn't arrive at location B before B started to modify the file), then the system has to reliably detect the conflict. Resolution of the conflict will probably have to be done manually.
not only all files of one single directory should be mirrored, but also all files in all subdirectories (if the administrator so desires)
If a file arrives by email, the the system should unpack and integrate it into the local replica (and perhaps rebroadcast it from the centre to the other locations) without user intervention.
Email sender addresses can easily be forged, so without additional measures some bad guy could easily send a forged update. So there sould be something like a digital signature to ensure that the files really come from an authorized file updater.
The replicated files are likely part of an organisation's intranet, so not everybody sholud be able to see the contents of files in transit. So there should be some form of encryption. This might well be combined with the authentication requirement.
When a file was edited, it should be sent to the other participating locations without any human having to remember the fact.
Not every save of a (possibly large, hundreds of kB) file should trigger a new email message containing the entire file. The files are assumed to be edited by humans working by session. So e.g. there could be an hourly cron-job that checks whether any file has changed since the last transmission of that file. If the last change was less than, say, one hour ago, it is considered "under construction" and won't be transmitted now. If the last change was more than, say, one hour ago, it is considered "stable" and will be sent.
Additionally, compression should be used.
It should be possible to integrate other ways of transport (e.g. mailed tapes, ftp sessions) into the system. So there probably should be an "incoming" directory that is being scanned at suitable intervals.
Similarly, there should be an "outgoing" directory, that is being scanned at suitable intervals.
check whether the received version conflicts with a local edit (this will mean trouble anyway, but it must be detected)
A conflict exists, exactly if version 1 (assumed to be consistent across all replicas) of some file becomes modified by location A resulting in version 2.A and by location B resulting in version 2.B. This happens exactly if B starts its modification before the change made by A (version 2.A) has arrived at and been integrated into the replica at B.
This can be detected not before version 2.A arrives at B. At that time, B has to detect, that the file was modifed after the previous bunch of files was sent from A to B. So B has to keep record of "the newest file that has arrived from A without causing a conflict" (if that's to be enough, it has to be assumed, that a conflict free arrival implies the complete arrival of all previously changed files).
The above assumptions seem too risky to rely on. So we really need a (kind of) database, constisting of one table containing these columns:
Such a record states: "as far as this location knows, at [verification timestamp] the newest copy of [filename] at location [location-ID] was the version dated [modification timestamp] with checksum [checksum]".
Primary key is the combination of filename and location-ID, i.e. for every combination of filename and location-ID there is exactly one record.
The records describing the local location are assumed to be consistent with the local files.
Every outgoing package containing a data file should also contain an excerpt of the database. At the very minimum this should be the record for filename and the sending location. A confirmation of every arrived file would be nice too, so the sender knows that the file has arrived. To avoid unnecessary network traffic, re-sends should be delayed until the confirmation can be assumed to have arrived under normal conditions.
At first, exclude the odd cases:
These cases can occur:
If a file arrives from location A at location B, there are several records to be looked at:
Each of them contains 2 timestamps, so we have 3*2=6 timestamps (mod-B-at-B, ver-B-at-B, ...) to compare. There are n*(n-1)/2 comparisons, so 6*5/2 = 15 comparisons, resulting in at least one bit each. 2**15=32768 possibilities, which is too much, so at first exclude the obvious ones.
To ease the overview, I'll present each single comparison in a matrix. Each matrix element describes the case "row heading
mod-B-at-B | ver-B-at-B | mod-A-at-B | ver-A-at-B | mod-A-at-A | ver-A-at-A | |
---|---|---|---|---|---|---|
mod-B-at-B | = | clock trouble | recent edit at B, B should already have sent the file to A some time ago, A has not yet confirmed arrival of update from B. check A-at-A for possible conflict. | recent edit at B, B should already have sent the file to A some time ago, A has not yet confirmed arrival of update from B. check A-at-A for possible conflict. Also for =. | recent edit at B, B should already have sent the file to A some time ago, A has not yet confirmed arrival of update from B. A either resent an old version (compare A-at-A with A-at-B) or we have a conflict. Also for =. | recent edit at B, B should already have sent the file to A some time ago, A has not yet confirmed arrival of update from B. A either resent an old version (compare A-at-A with A-at-B) or we have a conflict. Also for =. |
ver-B-at-B | should be always true ( | = | should be always true ( | should be always true ( | should be always true ( | should be always true ( |
mod-A-at-B | B knew for quite a time that A has a more recent copy, but the copy itself has not yet arrived, perhaps it's in the currently arriving package. Should not happen unless database contents travels on different paths than file contents. | clock trouble | = | clock trouble | A has certainly resent an old version, B already has a newer one. B already knew that A has something more recent than the arriving file. | A has certainly resent an old version, B already has a newer one. B already knew that A has something more recent than the arriving file. |
ver-A-at-B | nobody knows... conflict can't be excluded due to propagation delay. | clock trouble | o.k., also = | = | no clarification. | A has certainly resent an old version, B already has a newer one. B already knew that A has something more recent than the arriving file. |
mod-A-at-A | the arriving file is more current than the local one. Normal case. Certainly needs update, perhaps there is a conflict. | clock trouble | the arriving file is more current than the local one. Normal case. Certainly needs update, perhaps there is a conflict. | the arriving file is more current than the local one. Normal case. Certainly needs update, perhaps there is a conflict. | = | clock trouble |
ver-A-at-A | nobody knows... conflict can't be excluded due to propagation delay. | clock trouble | nobody knows... conflict can't be excluded due to propagation delay. | nobody knows... conflict can't be excluded due to propagation delay. | o.k., also = | = |
A conflict happens when (at least) two locations start modifying a file "simultaneously", i.e. B starts modifying the file BEFORE the modification made by A has arrived at B.
So to detect whether a conflict exists, one has to know for each modification, FROM what version the modification took place.
Version v1 is subjected to modifications m1 which result in version v2. In signs: v2=modif(v1,m1)
v2 is distributed by email. The process of distribution takes a considerable amount of time.
If there exist both v2=modif(v1,mA) and v3=modif(v1,mB) then there is a conflict.
If a file containing v2 arrives at some location B, B has to ensure that B's local copy still contains v1, before B adopts v2 as it's local copy.
Timestamp alone is not enough, because edits may happen at several locations simultaneously.
So use time (of last edit) and location of storage. This eliminates accidental treatment of different versions as being identical. But this also introduces treating identical version differently.
So use time of last edit and location of last edit. That's it!
To make it especially sure, include a checksum - that may elleviate the need for the location ID.
Every transmitted file transmitted from location A to location B needs to be accompanied by TWO version informations:
If the receiving location B can prove that it's local copy is identical to A's FROM-version, then B can be certain to not suffer a conflict.
The problem is to identify the FROM-version.
The modifying location A has to provide enough information that the receiving location B can be assured "A started modifying the same version that B still had as it's local copy".
The question is, when to sample. If a too late ID is given, then false conflict alarms may be triggered. If a too early ID is given, then real conflicts will go undetected.
To avoid false alarms or undetected conflicts, the FROM-version should be a version that is known to be identical to all local copies of all other locations (unless the other location has edited the file).
The perfect solution would be to have a database as described in Try 1, storing for each location the ID of the version that's supposed (better: known) to be stored there. Whenever this database indicates "all locations (including me) have the same version", then this ID is sampled and stored separately (not just as the "current local version"), perhaps as an additional pseudo-location "local FROM-version" a.k.a. "last globally synchronized version".
Each location needs to maintain information about the versions stored in all other locations with whom it directly exchanges files.
one table, primary key: (location, filename); one additional location "last globally synchonized". Columns:
Even in simple setups (one centre, branches exchange files only with the centre), the branches won't get along without an explicit database, because if they edit files, they have to determine the FROM-version, which needs some type of storage.
For each file and each remote location (branches may consider the centre as the only one), check whether the local version differs from the remote version (timestamp, checksum). If it differs, then check whether it should be (re-)sent ([time now] minus [time of last attempt]
Special case: if the remote version is more current than the local version, then you missed something. Time to ask your friendly system administrator.
Before sending a file, there should be knowledge, which version of that file is already available at the recipient's location. Fortunately, that's already solved by the conflic detection database.
But if a file arrives for the first time, then it usually the confirmations of the other locations are not yet here, so the local system might think "o, I have a file that's newer than all the others have, so let's broadcast it".
Solution (probably not the only possible one): if a new version arrives from the outside, store the time of arrival into the "time of last transmission attempt"-field for every other location. This might add a lot of stability into the system, because if the direct route from the updater to one of the sites fails, other sites are likely to re-send the file.
This requires that, upon reception of every single file, all others pretty reliably get the information about this new version at this place. Failure to send the database update (a.k.a. confirmation) will result in the file being re-sent.
The timeouts should be pretty different for each participant, to avoid quasi-simultaneous resends by multiple locations. Ideally, not only the timeout, but also the difference of the timeouts between each pair of stations should be larger than the roundtrip delay of email messages (confirmations). In a "centre and branches" setup, the centre should get the shortest timeout.
yet to be defined:
see above. The FROM-version arriving together with the updated file must be the same as the version that was stored here locally. Otherwise there is a conflict.
Very important.
Detection of conflicts vs. determination of where to send modifications (locally generated ones vs. ones that have arrived from remote locations) to.
Each location has to maintain a list of objects (e.g. files) that are to be included in the replication logic.
For each object, a version history needs to be maintained. The version history per object consists of a reasonable (see below) number of past version information records.
For each object, store a reasonable (see below) number of version records
at least enough to have one common (to most other locations) version covered
Either triggered manually (after modifications to files have been made) or automatically in reasonable intervals (e.g. each night), or if object updates from other locations arrive.
Take a look at each object. Per object do:
It might be useful to never update VIRs but to always create new ones. At least every broadcast version should be recorded permanently in the version history.
Whenever it has been decided to broadcast an object, the following data has to be transmitted to the other participating locations
In the case of replicated web-page collections, this is likely to be implemented as an email message with 2 file attachments.
The propagation path of updates is not prescribed by the logic presented here. Suggestion:
For each incoming object, compare the remote (incoming, accompanying) VH with the locally stored VH using the following algorithm:
perform a local database update check (in order to update the local VH)
scan the remote VH, looking for a copy the local MCVIR (compare CS; better yet if TIM and LOC agree), recording the position (ordered by TIM, newest first) of the local MCVIR within the remote VH. (This is the simplest version; one could include other checks to detect past conflicts. See section on "More General Version Status Description")
If the local MCVIR occurs in the remote VH, and its position is the newest, then it's just a duplicate boradcast - discard the incoming update.
If the local MCVIR occurs in the remote VH, and its position is not the newest, then we've received a real good update - copy the incoming object and the incoming remote VH and rebroadcast it (in case other locations use us as a hub).
If the local MCVIR does not occur in the remote VH, the we definitely have a conflict (or a buggy remote VH handler) - see section on conflict handling.
If a conflict is detected, there have been multiple simultaneous updates on one object. Store both incoming and local stuff in a safe place and resolve the conflict manually, typically by giving one version priority above the other (thus discarding one location's changes) and reapply the discarded changes. To be able to do that, it is of help to store the diffs of local changes until we can be sure that no conflict has occurred.
(Virtually) append a field "source" to each VIR, fill it with "L" (local) and "R" (remote) for the local and remote VH respectively.
Sort (by TIM, newest first) and merge both (local and remote) VHs into one set of VIRs.
Check for adjacent identical (as determined by CS, regardless of source) VIRs and compress those into one record with source:="C" (common).
Build a string of the sequence of the values of the source fields, so it's a sequence of C's, L's and R's.
Compress every consecutive sequence of identical characters into one single occurence of that character, e.g. any CC into C, any RR into R and any LL into L, until there remain no pairs of identical characters.
Now this is a quite precise description of the version status, thus let's call it "Version Status String" (VSS).
Every C represents a common, synchronized state. The hot parts are therefore before (newer than) the the first C, everything between adjacent C's, and after the last C. If there is no conflict, then every hot part should consist only of one single letter, representing the site where a modification was recorded and broadcast conflict free. If a hot part consists of different letters, then a conflict has happened, perhaps in the past. If there is a more current C, then this conflict has been solved.
The algorithm described here may be applied to diverse situations, where several things can vary largely. This can be solved by object interfaces. These interfaces are either APIs, so that adaptation layers (functions) need to be written, or some method to specify the invocation by a configuration file, either by specifiying command line templates (flexible, but not very efficient if object counts grow into 5 digit regions) or be selecting from a number of precompiled possibilities.
can range from a few files (greatly varying in size) on a directory tree to single records (or even fields) of large (millions, if not billions of records) databases. Each of these types of objects may need a different type of handling, e.g. how to determine the current modification date, how to determine the checksum, how to send an object and how to import a received update.
The modification date of webpage can easily be found by a directory listing, while this is less obvious for database records, so there needs to be a kind of API where separate ways of doing things can plug in. The following functions are needed:
The data type of oid might vary too, but a relatively short (hard limit of 16k characters acceptable) kind of string is likely to be the standard - this should be flexible enough to accomodate filenames or database key values.
A single web page file can comfortably be transmitted as a separate email message, while this is completely inefficient for a single database field.
Transport might be by various means, e.g.
Additionally, a variety of coding methods may be used:
Especially when transmitting through insecure channels (which e-mail certainly is), authentication is probably required, to avoid unauthorized intoduction of data by the "bad guys". There are many algorithms to achieve this, one way to go might be to attach PGP (or GnuPG) signatures to each transmitted file.
If the data exchanged is confidential (i.e. the contents must not be readable to unathorized people, e.g. in a distributed organization's intranet), encryption is needed. This needs flexibility on encryption algorithms and key management. The first thing I think of is PGP (or GnuPG), but there are many others...
Depending on the number of objects to be managed and the available systems (not everybody, especially on pocket computers, has a database server around), the version information may be stored by different techniques. I imaginable stuff like:
Thus an API is needed to abtract this layer:
MD5 is good, but not the only one. The checksum algorithm must be pretty good, especially it must be able to the the existence of a difference between an original file and a copy that has just 2 characters swapped (tpyo corrected).
And then there is the question of how to call the checksum computation. An API would just contain one function:
This is identical to a portion of the object store API.
Another theoretical option would be, to leave the checksum computation out of object store API and to compute the checksum by the replication logic, using a standard algorithm on the result of file_out. But this can be inefficient for large object counts (always open a file...). If the inefficency would be coutered by storing the object representation in RAM, trouble may turn up when real large objects (e.g. video movies) appear that won't fit into RAM.
Results of google search:
How to plug together the various objects?
"include" on interpreter level?
Java?
What language to use for implementation? Java? Python? Perl? C?
The ease of programming, wide availability, good modularity and relatively small footprint lead to the decision to use Python as the language of choice for implementing the system.
To enable a variety of combinations of application data, storage methods and communication methods, this architecture of modules is proposed:
Replication Controller ! ! +-----------------+----------+---------+-------------+ ! ! ! ! ! ! ! ! Administrative Application Data Message Message Database API Local Storage API Reception Transmission (ADBAPI) (ADLSAPI) API (MRAPI) API (MTAPI)
Each module is likely to consist of more than one Python class.
The selection of modules is done by naming the corresponding python source
code file in the controller's configuration file. The chosen modules are being
imported upon startup of the controller. This can be done e.g. by
modname="mod2"
exec("from "+modname+" import *")
For efficiency reasons, the data structures handled by the various APIs are likely to be specified in terms of lists and dictionaries, rather than classes.
To enable a richer set of common functions perfomed by the controller, it is preferable to use a string representation of the entire content of application data, not just an implementation dependent pointer (e.g. a filename). This enables e.g. checksumming or encryption algorithms to be provided by the controller, thus reducing the effort of implementing this kind of functions in the APIs.
This means, that the MRAPI and MTAPI have to be able to produce and accept such a string representation as a parameter for retrieving/constructing a message.
The drawback of this design choice is, that single application objects are assumed to be small enough, that 2 or 3 copies of the largest one fit into the python heap and thus into (virtual) RAM. This can be safely assumed for web page files and database rows. Multimedia files, especially long audio streams and video clips, are likely to break this assumption with current hardware - but files like this probably don't get transmitted through current email systems.
To cater for the case of databases, the complete list of application objects is assumed to be too large to fit into the python heap. The ADLSAPI has to provide methods to make sure to access all of them, e.g. during a consistency check.
To also cater for large collections, the list of all batched messages is assumed to not fit into the python heap. The MRAPI has to provide functions to retrieve messages one by one.
Each application object is identified by an object_id of type string. The same string is to be used in all APIs.
All locations (that can possibly modify objects) are identified by a location_id of type string.
All timestamps are of type string: YYYYMMDDHHMMSS, e.g. 20010324204759
is composed as follows:
Whenever the version list of a single object is retrieved or stored, this is done as a python list, composed of
each list item is stored/retrieved as a python dictionary:
A source_location_id and an object_id are not needed, because the version info does contain this information.
A msg_id is not needed, because an incoming message is presented only once anyway.
Whenever a message is stored or retrieved, this is done as a nested python dictionary (one separate dictionary for each message), composed exactly as described above.
one dictionary per location
all configuration data is read into an object of type cfg_handle at the time when the controller is started. It contains this information:
implementation specific configuration information for the adbapi, e.g. name of db-host, username, password (should be encrypted...)
implementation specific configuration information for the adlsapi, e.g. name of db-host, username, password (should be encrypted...)
implementation specific configuration information for the mrapi, e.g. name of the POP3 host or the "incoming" directory
class adb, containing these methods:
aoid = adb_get_first_oid() while aoid: vh = adb_get_version_history(aoid) check_and_modify_in_what_ever_way_needed(aoid,vh) adb_put_version_history(aoid,vh) aoid = adb_get_next_oid()
If adb_get/put_version_history are inefficient (e.g. because the implementation contains no equivalent of an index) for arbitrary sequences of oids touched, then adb_get_next_oid should be implemented in such a way that the efficiency of the walk is reasonably well.
class adls, containing these methods:
returns the contents of the specified application object as a (possibly very long) string. Returns None if oid is not present as application object, which is a relatively normal case e.g. if a newly created object arrives as a message.
If oid is not yet present as an application object, it is silently created.
aoid = adls_get_first_oid() while aoid: check_and_modify_in_what_ever_way_needed(aoid) aoid = adls_get_next_oid()
If adb_get/put_version_history are inefficient (e.g. because the implementation contains no equivalent of an index) for arbitrary sequences of oids touched, then adls_get_next_oid should be implemented in such a way that the efficiency of the walk is reasonably well.
Fully responsible for authentication/authorization.
class mr, containing these methods:
class mt, containing these methods:
Home | Christian Missions | Christustreff Marburg | Pictures of Marburg | Job | Remote Communications | Linux OS | Psion page | Contact