Go live Time :
This topic is intended to be the Canonized Design of the Canonizer LLC networked node DB system.
As an example of what is needed consider the current front "top 10" page. It works while there are only 40 or so users, 20 or so topics - each with just a few POV statements on average, and each of these with only a handful of supporters. But to "canonize" this top 10 list every time someone goes to this page, each topic must be "canonized". To do this, each statement in each topic must be analyzed. To do this, each supporter of each statement must be analyzed. To do this a currently selected "Canonizer" (each user can have a different canonize configured) must be applied to each supporter which must potentially analyze many of their personal attributes and calculate a "value" according to the canoizer algorithm specified, based on these. Then it must sum these "canonized" values for all supporters, then sum these totals for all sub camp statements and apply them to each "statement", and then the structure of these statements must be sorted according to these sums. Then the top 10 of all topics must be selected to be displayed, all sorted appropriately with the "canonized" values shown for each topic and statement.
Of course, if there are thousands (Millions and more?) of topics, each with 10 (hundreds?) or so POV statements, each with potentially hundreds (some with Millions?) of supporters, with the possibility of a Canonizer that queries a large number of attributes for each supporter - the system demand, just for this top 10 splash page, if not designed properly, could grow in a terrible non linear way right? Of course, canonizeing the whole world this way, will not always be required. Canonizing an individual topic will be a much more common task. But eventually we want to be able to canonize topics with potentially millions of supporters, with an arbitrarily configured canonizer, as close to real time as possible.
We want to follow a strategy of using cheep and redundant, co-located, systems all networked together and working in a peer to peer fashion. This is so that we can throw ever more systems at the network of systems as demand grows always having more than enough so that as systems fail no one notices.
We imagine such an open and independent replicating node system set up in a way so that other companies or third parties can instantiate them on their own hardware. Such instantiations will link up into this network of Canonizer nodes, and replicate the parts of the data they would like to use for their own canonization services on their own site, in a way that contributes to the performance of the network of all Canonizer nodes in a flexible peer to peer way.
There will be non secure versions of nodes, for browsing and canonizing, that almost anyone can instantiate and link up to the network anywhere. There will be secure nodes, likely tightly controlled by Canonizer LLC in secure environments. Submissions and maintenance of information will be done on these secure nodes via SSL. Information marked private or anonymous will never be shared with non secure nodes. Third parties will only be allowed to host such secure servers if strict security can be proven and guaranteed.
For the most part, no records in the DB will ever be modified. They will all instead have chronological copies showing a history of the values. When a new modification is made, a new version of the record, with the current time, will be submitted. When browsing, the most recent record will be used, unless an "as_of" time is selected during browsing for historical purposes. We want to be able to look at the state of the Canonizer's data at any time in the Canonizer's history.
Each Canonizer node will have two sources of data. First, from a user submitting a change or new record. The node will query a master control node (perhaps done previously as a set, and cached) to get the unique identifier for the new record. This new record with its unique identifier will then be stored in the node's DB. Once this change is committed locally, this node will have a small number of "peers" in the network.
This replication process will be the second source of data for each canonizer node. Such replication would be initiated by a very efficient query to see of the peer already has the data. If not, the data will be replicated to the node that does not yet have it, and the process will repeat from this peer, to all of it's peers as fast as possible keeping that data in all nodes as updated as possible.
Non secure canonizer nodes, will not take data from users, but will only receive non secure replicating data from the network.