Central hub

Here is my proposal to create a central (opt-in) hub of information relating to pods in our little social network. Please read the full proposal before judging it ;)
I would like to vote on this as soon as all constructive comments relating to technical specifications is processed. I am prepared to code it and host it (until official servers exist).
Full proposal here: https://wiki.diasporafoundation.org/Central_hub
Ps. I created this in the main community group, not a subgroup, since I think this is something that all community members should be allowed to vote on without having to hassle with requesting subgroup rights.

Jason Robinson Sat 16 Nov 2013 1:20PM
@emmanouelkapernaro the central hub should not be required for a pod to run - so your pod would not be affected in any way if the central hub is down or disappears.
Some services which use it might be affected - but pods would not be affected in any way. Please read the proposal carefully :)

Jason Robinson Sat 16 Nov 2013 1:22PM
Maybe calling it a "hub" was wrong and confusing :P Infostore? Register? :)

Emmanouel Kapernaros Sat 16 Nov 2013 1:22PM
Yes, I understand this. If the central hub is down the my pod stops having public tags federation.. How this is not affecting my pod in any way?

Jason Robinson Sat 16 Nov 2013 1:25PM
@emmanouelkapernaro actually if you read the proposal for relays carefully, the central hub is not needed since pods would cache a list of relays. Anyway, that proposal does not relate to the hub directly - these are separate things. The relay is not even the only proposal concerning public posts.
Is your pod affected now by missing public post federation? I think it might be affected :)

Emmanouel Kapernaros Sat 16 Nov 2013 1:45PM
@jasonrobinson public post federation now is a missing feature. It affects my pod, yes. I dont want to have something new that also affects my pod, thats why I prefer not to have anything centralized.
So, to understand better your proposal, the central hub is only needed just to hold as much information about what pods are out there and helping them get in touch by distributing this info openly, right? Like podupti.me but more useful for features like federation.
If this is correct, then why not thinking about a decentralized method of distributing information? I am not an expert but isn't that what DHT is for? Am I too wrong here?

Brad Koehn Sat 16 Nov 2013 1:52PM
What happens if different pods use different hubs? Can pods use multiple hubs? Is there anything good or bad there?
Also, are the pods listed in the hub ("directory" might be a better term) curated in any way against bad actors (e.g., a pod that is spamming)? I understand that the hub doesn't move messages on its own, but I'm trying to get at the policy implications.

Jason Robinson Sat 16 Nov 2013 2:34PM
@emmanouelkapernaro because making a decentralized way of providing information on the network would be overkill. If one morning, until someone installs a new hub (imagine the data center is wiped off the earth) and restores a backup, you will not be able to see how many pods or users the diaspora network has - will that affect anyone at all? No, because the hub (or directory) should not contain any information that stops pods from working without it. That is stated in the proposal :)
Whatever system we would make to sync all the information to all pods for this requirement would just overkill. When building software one was meet the needs as is required, not spend time on building stuff that doesn't offer enough benefit concerning the effort that goes into it (imho at least). It's not like we're overflowing with developers and we don't have anything to do ;)
So my question back is - what would be the real benefit from decentralizing opt-in information concerning the diaspora* network as a whole? A real use case or just because it's possible? ;)
Also I think you're missing the main addition by this - information. The main benefit is information on the network towards the rest of the internet. To show that diaspora* is not a dead project. To show the network is growing. To prove to ourselves the network is growing :)
@bradkoehn Similarly I don't really see the point for multiple hubs - the benefit is really small. The code would be open source and if the data is open - any disaster to the hub would just involve someone reassign the subdomain IP and setting up a new hub.
But then again, I don't think there is any reason to not include the hub address in configuration. Maybe multiple hubs are needed when we have 1 million pods, then they would just need to sync up together ;)
As for spamming, the hub collects the information, it doesn't take information. If someone goes through the effort to customize their pod to give wrong information or makes a dummy pod that talks like a pod - I'm sure those cases can be dealt with quite quickly with some manual admin work. The important part is that the hub calls the pods for info, not the other way around.
I'll add this to the proposal;
- End point to get the full data export from the hub (just a static link to an archive updated daily for example - this should of course not be abused). In a disaster case this export would be imported to a new hub. Thus no one needs to trust the server admin.
Also please remember this is just a suggestion and a draft, please feel free to suggest on changes. And btw, some resources we have centralized would really hurt the network. Like the wiki - if that just disappears we will not have any new pods up since no one will know how to set one up. Would be really nice to have the articles synced somewhere :P

Jason Robinson Sat 16 Nov 2013 2:41PM
Oh yeah, I forgot my original idea that pods could optionally hide some of the dataset. For example some pod might not want to report the amount of users but would want to register to the hub (directory).
I'll add this too.

Brad Koehn Sat 16 Nov 2013 2:59PM
@jasonrobinson There are many benefits to having multiple servers; we allow pods to configure ther services they want to use (like Google's Pubsubhubbub provider) in case they want to isolate themselves from another group of pods, or simply prefer another provider's services. All centralized services should be selectable by configuring the pod.
Once you get into the business of curating the pod list, the ability to support multiple servers is critical.
I'm not sure I like using a different stack for the reference implementation; if everybody who implements a centralized service does it in the stack of their choice we'll soon turn into a hodgepodge of disparate server technologies. I guess the API is so small that somebody could whip out a new one with minimal effort, but a new developer now has to learn another stack or write a completely new implementation from the ground up. I don't think Mongo buys you anything you can't get from a normal RDBMS anyway (this is pretty much some REST APIs in front of a single table, right?); my guess is that it's a stack you already know more than the right stack for the team as a whole.

Jason Robinson Sat 16 Nov 2013 3:06PM
@bradkoehn That is why I suggest the hub is configurable already.
The MEAN stack is not something new or unknown - it is or at least the components used are becoming very popular. Node is HUGE and really the best tool for things like this, imho. Constantly someone is saying "I would participate but Ruby..." so I don't think the answer is to just build everything in one language. Ruby is also very resource hungry compared to a pure javascript implementation. There is a reason that Diaspora is made in rails - there is no reason the hub needs to be ;) Also there is no reason to not use something like MariaDB instead of Mongo.
Anyway, technical details can be decided on separately, that was just a suggestion. The main thing is to agree (or not agree) on the need for this.
But if it's in Ruby I cannot do it since it would take too much time - I don't have that much interest to learn more Ruby than I need since I want to concentrate on Python and JavaScript. I'll still offer to host it :)

RAM518 Sat 16 Nov 2013 6:42PM
This idea just flies in the face of the decentralized nature of D*, it seems.

Jason Robinson Sat 16 Nov 2013 6:48PM
@ram518 how?
Crazypedia Sun 17 Nov 2013 12:59AM
couldnt much of this be done with an API (in the works, or so ive heard?) in that any one, or any server could ping any or all known pod's API to get this information, provided the admin has allowed this feature to be enabled? This would allow continued federation of information, and the option of a hidden pod or simply a pod tha has decided to opt out for privacy reasons.

Jason Robinson Sun 17 Nov 2013 1:41AM
@jacobschleappi how would you know what pods to ping?
Also this proposal does not force anyone anything - please read the "opt-in" in the proposal mentioned many times.
[deactivated account] Sun 17 Nov 2013 8:23AM
Isn't the proposal a bit "fat"? Pods can't query unknown pods directly for obvious reasons; to address this, much less work is required: one only needs to let pods "register" with any number of trackers and ship the Diaspora server code with a customisable list of such trackers. Everything else can then be negotiated pod to pod.
A tracker doesn't need to store anything about a pod except for the hostname and the time of last ping (to allow for record expiry).
[deactivated account] Sun 17 Nov 2013 8:31AM
About pod registration: your proposal says this:
/register Pods will call with this initially when they want to register the pod. This call will be followed by the central hub calling back to check the pod is really a pod.
How would the hub be able to ensure that a) the party issuing the registration is a pod and b) speaks on behalf of the registered pod? Re a: is it a problem if a registering client behaves like a pod but is not? Re b: it should not be possible to register a pod you don't control. To demonstrate control, the creation of a DNS text record with a token could be sufficient to solve this problem.
This brings me to the next question: if this all only works for Diaspora pods and requires explicit consent --- why use REST? Pods are capable of much more complex communication than simple HTTP requests (e.g. stateless vs stateful).

Jason Robinson Sun 17 Nov 2013 10:48AM
@rekado I think you (and everyone else who has commented) are missing the (main) point of the whole system - to gather metrics on the diaspora network. And in metrics it's not just about pods - it's about users. We cannot have reliable statistics towards the outside world without a reliable way of collecting them (from participating opt-in pods). Listing pods is another thing, collecting user counts is another.
Yes we could do as you say and have a gazillion of trackers. You can call this hub I propose a tracker because that is what it would be. I'm happy to have many trackers - but I don't see how that makes the system any better.
I'm going to create a vote on the public metrics thing - I should have started there. Clearly proposing something technical only made sure to hide the whole point of the proposal, eg the end result.
On the other questions;
How would the hub be able to ensure that a) the party issuing the registration is a pod and b) speaks on behalf of the registered pod?
Simple - it doesn't save the pod until it calls back. If the pod "speaks pod" - it's a pod. If it replies "wtf I don't want to be registered go away" - then clearly it did not make a request. This is the same way "verify your account by clicking the link in the email" works to ensure ownership (well, other direction, but same principle).
To demonstrate control, the creation of a DNS text record with a token could be sufficient to solve this problem.
We're gathering statistics, not solving privacy or authentication issues - no need to shoot a fly with a cannon :) Making pod admins have to modify their DNS just to register is a bit too much imho. But of course, the way I indicated was only an example - it would be nicer to agree on whether this thing is needed.
if this all only works for Diaspora pods and requires explicit consent — why use REST? Pods are capable of much more complex communication than simple HTTP requests (e.g. stateless vs stateful).
What would that achieve that a simple POST/GET would not? To be honest, I'd agree to anything, the end result is the most important. I just don't see the point of building complicated things to make something simple. We should leave the complicated things for the important stuff, eg Diaspora itself.

Poll Created Sun 17 Nov 2013 10:55AM
Does the diaspora* network need statistics? Closed Mon 18 Nov 2013 7:48AM
Will do in my own repo
Ignoring all the technical details on how to accomplish this, please consider the following.
Would you want the diaspora* network to gather statistics from opt-in participating pods (with default being not participating in new pod configuration) about the following;
- Name of pod
- URL of pod
- Registrations open / closed
- Version
- TOS (when implemented)
- Amount of local users
- Amount of local users active last 6 months
The statistics would be fully open to anyone with full transparency and possibility to opt-out for pods who change their minds.
Results
Results | Option | % of points | Voters | |
---|---|---|---|---|
|
Agree | 50.0% | 7 |
![]() ![]() ![]() ![]() ![]() |
Abstain | 14.3% | 2 |
![]() |
|
Disagree | 35.7% | 5 |
![]() ![]() ![]() |
|
Block | 0.0% | 0 | ||
Undecided | 0% | 255 |
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
14 of 269 people have voted (5%)

Jason Robinson
Sun 17 Nov 2013 11:10AM
On the anniversary of diaspora* as a community project we did lots of promotion. TechWeekEurope came back with one question - "How many users are there?". We replied that we don't know and can only guess, and not surprisingly they never wrote back.

StarBlessed
Sun 17 Nov 2013 11:11AM
I think this is not the correct way to proceed. While I agree there is a need to show statistics to prospective and current users, this is not the way to do it. We must continue to decentralise our network. Not centralise it. Step in the wrong dir.
avey
Sun 17 Nov 2013 11:20AM
since this vote is NOT about whether it is central or distributed type of information gathering I do agree that there is a genral need for gathering this information with opt-in basis to show that diaspora* is alive and growing.
[deactivated account]
Sun 17 Nov 2013 11:47AM
I don't think statistics are at all important. Pod-local stats are important for the pod admin; since there is no "network admin" in a decentralised network I don't see the need for stats at that level. Email didn't need usage stats either.

thomas
Sun 17 Nov 2013 12:19PM
opt-in statistics (default being not participating) - seems like a good enough compromise between privacy and interest for information about the diaspora network.

Jonne Haß
Sun 17 Nov 2013 12:36PM
I like that publishing your pod in a list is an action you need to take rather than something you need to turn off.
+1 on anonymous statistics about users and maybe known pods, but those shouldn't be linked to the submitter in any way.

Emmanouel Kapernaros
Sun 17 Nov 2013 12:43PM
I think it is important to push forward to a path where everyone has its own pod (1 seed/pod), where nobody should care about others' pod TOS or stats. Also I don't believe that by saying "we have a million users" will bring more awarenes..

lnxwalt
Sun 17 Nov 2013 8:52PM
As I noted in my comments above, this would not be useful without 100% participation, and information collected is likely to gradually expand over time.
A federated, decentralized network should avoid dependencies on centralized services.
arcke
Sun 17 Nov 2013 9:55PM
Gathering statistics about the diaspora network does not have to be done by the project itself. It can be done by a third party. I think diaspora should focus on building the core of distributed/de-centralized social network.

Sean Tilley
Mon 18 Nov 2013 6:40AM
IMO, a central hub run by the project in its current volunteer non-profit infrastructure could actually be a good thing, especially if it's only opt-in, doesn't show personal analytics of users, and respects privacy.

Jason Robinson Sun 17 Nov 2013 11:16AM
Please note @starblessed this vote is not about the way to do it, but the whole WANT, do we want statistics.

StarBlessed Sun 17 Nov 2013 11:29AM
@jasonrobinson - I know, and understand. Please don't mistake my vote for being uninformed. My vote stands. I couldn't fill out the complete answer as I had a character limit. I can't now, because I am trying to sleep. Sufficed to say, I think the basic concept of what you are asking us to vote on is flawed. Being a distributed network, and what we stand for, gathering of statistics (opt in or not) goes against everything we have been striving to protect. At the end of the day, that's what it boils down to. Privacy. If you can find a way of gathering COMPLETELY PRIVATE statistics (no urls; no direct info about the pods, including TOS) then sure, go for it. Until then, podupti.me is adequate enough.
[deactivated account] Sun 17 Nov 2013 11:43AM
"(main) point of the whole system - to gather metrics on the diaspora network."
True. I did not see this anywhere in the proposal. I assumed (based on previous discussions here) that this was a proposed solution to the perceived problem of public post distribution. My bad.
[deactivated account] Sun 17 Nov 2013 11:44AM
"I just don’t see the point of building complicated things to make something simple"
Here's where you missed my point :)
What I proposed is simple. But then again, we're aiming to solve two different problems, so they point is moot.

Jason Robinson Sun 17 Nov 2013 11:52AM
@starblessed of course you are entitled to disagree, I did not mean to indicate otherwise. I just find it odd that it is better to force podmins who want to list their pod in a directory to register with a commercial company (pingdom) to be enable to do that. Why is that good? :)
@rekado had to check the proposal, sure it's a bit hidden, mainly because the proposal is too long, my bad.
While some technical proposals (like Relay servers for public posts and Tag aggregation) require or benefit from this kind of central hub, I think the real benefit is visibility into the network itself.

goob Sun 17 Nov 2013 2:15PM
@jasonrobinson, may we please have longer to discuss, consider and vote on this issue? A week isn't very long for a non-trivial issue. Hope that's not a problem.

goob Sun 17 Nov 2013 2:17PM
Alternatively, allow time for a discussion before opening the vote.

Jason Robinson Sun 17 Nov 2013 3:12PM
@goob sure, though I didn't expect anyone to object to the possibility of acquiring metrics about the diaspora* network as a whole. I understand privacy, but how can a whole network have privacy as a whole? I just don't get it, maybe I've understood diaspora* as a project wrongly since the beginning. I thought it was about building an alternative to things like Facebook :) Instead it seems that there is a strong sentiment towards being a non-network. Not quite sure why we need a foundation if we're not going to be a network :)
This vote is mainly to clarify this. If there is strong support for the non-network side, personally I don't see how I can help the project since I'm highly interested in building a network.

goob Sun 17 Nov 2013 6:11PM
Oops sorry, I only skim-read the proposal and thought it was also about forming a centralised service - but you've said 'regardless of the means of doing this' - my mistake. Yes, I'm sure that will be a lot less controversial! Sorry.

lnxwalt Sun 17 Nov 2013 6:38PM
Is it intended that every piece of information collected is optional? I realize that you're wanting something to present to news media, but without nearly 100% opt-in (on every piece of information), this centralized stats service is nearly useless, because its information will be very inaccurate.
And yet, an opt-in without detailed control is not much of an opt-in. The important thing there is that once automated data-collection is authorized, how one keeps it from expanding to encompass information not contained in the original proposal. This is especially so given proposed centralized distribution points for public posts and hashtags. ["While some technical proposals (like Relay servers for public posts and Tag aggregation) require or benefit from this kind of central hub, I think the real benefit is visibility into the network itself." https://wiki.diasporafoundation.org/Central_hub]
One of my early CS instructors said something that has always stuck with me:
All information that you collect will eventually be stored. All information that you store will eventually be misused.
I would rather support an optional page on each site which collected site-specific statistics (which could also be available in a machine-readable form), assuming that the podmin would have direct control over which stats were gathered and displayed .

lnxwalt Sun 17 Nov 2013 6:57PM
I should note that there is at least one other federated network doing this already. The same issues could arise in the future there.
arcke Sun 17 Nov 2013 9:57PM
Would the collected statistics have any "marketable" value? We would never know how many accounts/users reported would be real ones. There can be any number of pods filled with bot accounts or pods reporting wrong data on purpose.

Flaburgan Mon 18 Nov 2013 4:40AM
Hi Jason. Having information and statistics about is interesting. But as Jacob said, I don't think we need a central entity to allow that. We could simply add a route like /statistics which will, if enabled in the config file, return a JSON containing anonymous statistics of the pod. Then, if we want to add a page "global statistics" which will do once per day a foreach on the pod table asking the stats of every pods, we can do that. We definitely don't need a central entity to do that.

Sean Tilley Mon 18 Nov 2013 6:42AM
Actually, I could see this being potential for something along the lines of a more robust Pod Uptime. I wouldn't be against having it integrated somewhere into the Diaspora Foundation project site. It could be useful for providing end users with a pod to go to, if they don't want to self-host. We could probably integrate reviews and terms of service for different pods so that a user would have an easier time choosing where they would prefer to reside.

Jason Robinson Mon 18 Nov 2013 7:46AM
Finally someone agreeing with me - thank you @seantilleycommunit :)
I don't understand who having a third party solution would benefit diaspora* network users better. Right now it's actually a lot worse. I love these services but let me point something out.
diapod.net is opt-out, not opt-in. It will list your pod whether you want it to. AFAIK your pod could get listed even if you never make a single public post from it. How is that better than our own opt-in service?
podupti.me requires you to sign up to an account with pingdom which is commercial company. How is that better than our own service which doesn't require setting up an account anywhere?
@flaburgan we need a pod list, period. The /statistics route is a great idea, the pod directory could poll that indeed.
Some other comments - Please don't use the words decentralized and privacy, which are good words, to hide behind if you don't even know what is being proposed. It's ok to disagree (like Jonne and Rekado), just please don't mix up technical details. I realize it was wrong to even bring this subject to the main group now. I think I'll just start doing the hub part and then anyone who wants to join can patch their pod with the commits. In fact, I'll close the vote as it's clear there is just too much confusion being generated and I don't feel like battling windmills.

Jason Robinson Mon 18 Nov 2013 7:46AM
@jonnehass btw, I did write clearly opt-in for the vote, please read more carefully.

Flaburgan Mon 18 Nov 2013 8:54AM
@jasonrobinson don't get me wrong, having a better list of the pods than poduti.me and maintained by the foundation is a good idea. My remark was on the approach you proposed: I'm not sure that hardcoding an url inside the config file is a good idea. I think we could simply have a boolean inside that file to say "I want to share statistics about my pod" or "i don't want to share statistics", and that's it. After that, everybody who requests the correct url will have the information, so it could be used by poduti.me, by the other pods, by who we want. And we can propose to the podmins to register on diasporafoundation, as they do on poduti.me, and we will be able to list pods and their stats.

diasp_eu Mon 18 Nov 2013 7:39PM
Since 2011 we do exactly this at https://diasp.eu/stats :-) it would be nice if we can read the real amount of users for each pod instead of estimating it.

Maciek Łoziński Tue 19 Nov 2013 8:13PM
It's funny, I was just about to write about the idea of decentralized stats - in my opinion every hub should have a page like that, showing main connected hubs' statistics.
a big benefit is - we do not need to maintain hub's codebase, and no need to host any additional service which is always a problem.

Jason Robinson Tue 19 Nov 2013 8:24PM
And none of the stats would tell the whole truth :)

Maciek Łoziński Tue 19 Nov 2013 9:21PM
But do we really need the whole truth, or just a clue?

Jason Robinson Tue 19 Nov 2013 9:33PM
That is a matter of opinion. Besides, we would never have the whole truth, because it has to be opt-in of course. But with even only a majority podmins opting in, it would give a nice window into how diaspora* is really doing.
But as said, matter of opinion and a pod list IS needed for many things mentioned here on loomio many times.

diasp_eu Wed 20 Nov 2013 8:51PM
Let's focus on the diaspora API.
Emmanouel Kapernaros · Sat 16 Nov 2013 1:16PM
Sorry if my comment is wrong because I may have not completely understand your proposal although I have read the wiki about the central hub and relays.
I have to tell you that, as I see it, diasporafoundation.org or github repository (which are central hubs in a way) are not the same..
For example if diasporafoundation.org is down, I dont care :P nothing bad happens to my pod. If github close for ever, I dont care either, because the source code is not dependent on github.. git is decentralized..
But if the central hub you are proposing is down, then we have a problem. One of the reasons diaspora is better is because it is not dependent on a central mashine.
From diasporafoundation.org:
What is decentralization?