Bug 255405

Summary: EventStats macros: unknown error handler name 'fallback:iso-8859-1'
Product: Services Reporter: Jethro Nederhof <jethro>
Component: WikiAssignee: Kubilay Kocak <koobs>
Status: In Progress ---    
Severity: Affects Some People CC: clusteradm, wiki-admin
Priority: ---    
Version: unspecified   
Hardware: Any   
OS: Any   
See Also: https://github.com/moinwiki/moin-1.9/issues/37
http://bugs.debian.org/811547
http://bugs.debian.org/867239
https://github.com/pallets/werkzeug/issues/1706
Attachments:
Description Flags
script to populate hitcount caches for pages
none
Updated cold start script. none

Description Jethro Nederhof 2021-04-26 01:05:24 UTC
Moin has a feature to keep track of views of each page and store stats for them.
This enables producing reports like the following:
http://moinmo.in/EventStats/HitCounts
https://gcc.gnu.org/wiki/PageHits

On the FreeBSD wiki equivalent pages (https://wiki.freebsd.org/PageHits , https://wiki.freebsd.org/EventStats/HitCounts), the requests seem to time out and an error is produced:
<<StatsChart: execution failed [unknown error handler name 'fallback:iso-8859-1'] (see also the log)>> 

This data would be useful for populating the "Popular" section on the wiki homepage at https://wiki.freebsd.org/
Comment 1 Kubilay Kocak freebsd_committer freebsd_triage 2021-04-26 01:31:01 UTC
werkzeug behaviour change, fixed in moin (upstream didn't resolve it there [1]) 

[1] https://github.com/pallets/werkzeug/issues/1706#issuecomment-578552492
Comment 2 Kubilay Kocak freebsd_committer freebsd_triage 2021-04-26 02:25:28 UTC
Error is now sorted out (pkg upgrade on wiki instance).

The macro when used however, results in gateway timeouts. Need to investigate potential workarounds/solution to this (if there are any).

See Also: 

http://moinmo.in/MoinMoinBugs/HitsProblem
Comment 3 Kubilay Kocak freebsd_committer freebsd_triage 2021-04-26 02:43:40 UTC
See Also:

http://moinmo.in/HowTo/Tune%20Performance#Regular_maintenance

Relevant part:

The statistics stuff (EventStats, PageHits) is reading data/event-log. That file is growing over time and big event-logs slow down the statistics stuff. So if you are not interested in the stats from 2 years ago, you maybe want to rotate that log for performance reasons. You could even just truncate event-log to 0 bytes if you don't mind your statistics stuff starting from scratch. 

Our event-log is not small.

I wonder if there's a way to process this offline (cli)
Comment 4 Jethro Nederhof 2021-04-26 12:10:52 UTC
Created attachment 224440 [details]
script to populate hitcount caches for pages

Haven't been able to test it properly but if I'm reading the moin code right it would be something like this to do that.
Comment 5 Kubilay Kocak freebsd_committer freebsd_triage 2021-05-05 03:41:00 UTC
The behaviour of the stats gathering mechanisms is that it will process (backward in time) the hits/stats file until the last cached timestamp is reaches. We ran a test of attachment 224440 [details] and it failed at aggregation stage. Needs a tweak so we can complete a full run
Comment 6 Jethro Nederhof 2021-05-17 10:14:26 UTC
Created attachment 225020 [details]
Updated cold start script.

- Updated to skip until it's processing just the last 90 days
- Only count stats for pages that still exist
- Handle the error case where the TSV row randomly has additional columns
- Print some progress messages
- Force some garbage collection in case that helps the memory issue
- Add a report of popular pages that aren't found any more (dead links = good redirect candidates)

Totally forgot about this, apologies for the delay!
Comment 7 Kubilay Kocak freebsd_committer freebsd_triage 2021-05-17 10:43:16 UTC
No apologies necessary, thank you Jethro :)