Bug 159652

Summary: PR is not visible to search engines
Product: Documentation Reporter: mwisnicki+freebsd
Component: Books & ArticlesAssignee: freebsd-doc (Nobody) <doc>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: Latest   
Hardware: Any   
OS: Any   

Description mwisnicki+freebsd 2011-08-10 18:20:09 UTC
Currently www.freebsd.org/robots.txt disallows access to /cgi/ and that includes query-pr.cgi which makes bugs invisible to search engines :(

Please allow access to this and possibly other useful resources (man.cgi?) and do whatever is necessary for google to pick it up.
Comment 1 Remko Lodder freebsd_committer freebsd_triage 2011-08-10 20:30:55 UTC
State Changed
From-To: open->closed

I dont think this is an good idea. The CGI scripts are pretty intense 
and the man.cgi has alternatives which can be used to get this 
information. I will close this PR for those reasons, but I would like to 
thank you for trying to make FreeBSD better! It's appreciated!
Comment 2 mwisnicki+freebsd 2011-08-10 21:21:01 UTC
On Wed, Aug 10, 2011 at 21:30,  <remko@freebsd.org> wrote:
> Synopsis: PR is not visible to search engines
>
> State-Changed-From-To: open->closed
> State-Changed-By: remko
> State-Changed-When: Wed Aug 10 19:30:55 UTC 2011
> State-Changed-Why:
> I dont think this is an good idea. The CGI scripts are pretty intense
> and the man.cgi has alternatives which can be used to get this
> information. I will close this PR for those reasons, but I would like to
> thank you for trying to make FreeBSD better! It's appreciated!

I disagree. It is a serious problem and should not be ignored just
because of infrastructure limitations.
Why bother having a web page if it not findable on google ?
Besides, it could be solved with caching or by exporting content to
static pages with periodic updates.
Comment 3 Remko Lodder 2011-08-11 07:40:41 UTC
Hello Marcin,

> I disagree. It is a serious problem and should not be ignored just
> because of infrastructure limitations.

I disagree. This is not a serious problem. It is a perfect valid choice
given the infrastructure we have. So we decided NOT to get this indexed
by google via the CGI scripts. It is fine that you do not agree with
that
but as long as the system administrators for FreeBSD do not wave the
white
flag for this, this is not going to change, no matter what your opinion
about
this is. It had been noted though that you disagree and would like the
behaviour
changed. If it will be changed, you will notice it soon enough.

> Why bother having a web page if it not findable on google ?

Because it's interactive for your purpose and not google's?

> Besides, it could be solved with caching or by exporting content to
> static pages with periodic updates.

There are possibly many more alternatives, however, we decided to take
this
route, and will taking the route untill someone that can decide about
this
over the possible impact will change the course.

Thank you for trying to make FreeBSD better.

-- 

/"\   Best regards,                      | remko@FreeBSD.org
\ /   Remko Lodder                       | remko@EFnet
 X    http://www.evilcoder.org/          | 
/ \   ASCII Ribbon Campaign              | Against HTML Mail and News
Comment 4 mwisnicki+freebsd 2011-08-14 14:49:14 UTC
On Sun, Aug 14, 2011 at 15:18, Mark Linimon <linimon@lonesome.com> wrote:
> On Sun, Aug 14, 2011 at 02:59:30PM +0200, Tilman Keskin=C3=B6z wrote:
>> The gnats database used to be available via cvsup
>
> Still available.
>
> Also note that all the PR traffic to the various mailing lists is exposed
> to Google in the first place.
>

And to see how great this works just try to google the subject of this pr.

Even with older PRs it is not exactly pleasant experience (lots of
shady mailing list archives crammed with giant ads).
I assume that results from freebsd.org would come up first/high like
it happens for other projects (it's also a reason why setting up
unofficial mirror of gnats as suggested above would not help).
I'm not even sure if all PR categories are forwarded to mailing lists.

PS. My other gripe is that people routinely forget to include
bug-followup@ when responding to PR.
Comment 5 Simon L. B. Nielsen freebsd_committer freebsd_triage 2011-08-21 10:05:37 UTC
On 10 Aug 2011, at 19:19, Marcin Wisnicki wrote:

> Currently www.freebsd.org/robots.txt disallows access to /cgi/ and =
that includes query-pr.cgi which makes bugs invisible to search engines =
:(

The problem is that query-cgi allows searches which are quiet heavy =
resource wise (taking minutes to finish). If we just remove the =
robots.txt from that area and the query cgi any reasonable amount of =
hardware won't be enough. We even have extra protection which only =
allows two query-pr's to run at once to avoid killing www.FreeBSD.org.

I think it would be very nice if we could allow this, but it requires =
someone to figure out a way to make query-pr.cgi not do searches when =
indexes by robots. The simple, and also rather clean solution IMO, would =
probably be to separate the viewing of a single PR, and searching for =
PR's into different scripts - then we can allow allow robots to index =
the real PR's without a problem. The problem for that is that somebody =
has to do it and somebody else has to review before it's put into =
production.

--=20
Simon L. B. Nielsen
Hat: FreeBSD.org admins team