Loomio

Stopping indexing of profile pages

G goob Wed 12 Jun 2013 6:22PM Public Seen by 35

Profile pages are, as I understand it, supposed not to be indexed by search engines & crawlers. However, this is not what's happening.

Profile pages are being indexed by some crawlers at least (Google for example) - not the content of the pages, but the existence of the profile and its URL - and clicking on the URL takes you to the profile page, even if you're not logged in.

If I put “Diaspora HQ” into Google, I get:

Diaspora HQ
https://joindiaspora.com/u/diasporahq
A description for this result is not available because of this site's robots.txt – learn more.

The same happens for my own profile page.

So things are a bit confused. Either we want search engines to index profile pages properly, or we don't want them to index them at all.

The robots.txt disallow for /u/ and /people/ seems to stop the crawlers from indexing the content of the profile pages, but not their existence and URL, and it's easy then to view the content.

Would adding

to the header of each profile page prevent the crawlers from indexing profile pages completely? If so, could it easily be coded so that this line is added to profile pages whenever they’re created?

EG

Erwan Guyader Wed 12 Jun 2013 6:45PM

As I said here , I believe that public profiles SHOULD be indexed as people posting public posts probably want them to get some visibility.

However, I would agree with implementing a setting to make a profile private. If this is activated, the profile page shouldn't be accessible to anybody not sharing with that person.

G

goob Wed 12 Jun 2013 6:57PM

I think a possible solution would be to have an option in the user settings: 'Allow your public profile to be indexed by search engines?' - but wanted just to raise the issue of what seems to be a discrepancy at the moment.

JR

Jason Robinson Thu 13 Jun 2013 7:35AM

It would be nice to have a setting for profile being public and private. We kind of have the "allow searching on Diaspora*" thing but not sure what exactly it controls. Maybe just modify that to allow public profiles more visibility on search engines, and private ones less.

F

Flaburgan Thu 13 Jun 2013 8:57AM

This setting has to be added.

(By the way, this was the object of the discussion I created, linked by Erwan below)

G

goob Thu 13 Jun 2013 11:28AM

@jasonrobinson That would be the ideal situation, but I can foresee it being difficult to implement, because at the moment there is one robots.txt file in the root of each pod, and if this disallows /u/ and /people/, this is going to apply to both public and private profiles - unless it was altered so that public profiles were put in directories called /pu/ and /ppeople/, but this could get messy.

I think at the moment it would be good to make a decision either to allow complete indexing or no indexing of profile pages, because at the moment the situation probably pleases no one - people who want a public profile will want it properly indexed; people who want a private profile will want it not indexed at all.

@flaburgan your discussion was about public posts, this is similar but different, because it relates to the robots.txt file and indexing of profiles.

JR

Jason Robinson Thu 13 Jun 2013 12:01PM

@goob wasn't talking about robots.txt - why not just force login for those profiles that are not public - and then check if user can see it (=in contacts, etc).

J

jonsger Fri 14 Jun 2013 8:02PM

btw:
I checked Diaspora HQ and other diaspora users at a few search engines. I came to the result that DuckDuckGo, Bing and Yandex.ru don't index profile pages or don't show the profile pages in the results.