Commit graph

254 commits

Author SHA1 Message Date
Jakob Ackermann
37fda043d4 [http] use public node api for getting the response content-length 2020-08-17 11:59:18 +01:00
Brian Gough
e5747fefd2 remove package-lock.json (not needed for library) 2020-08-06 14:52:07 +01:00
Brian Gough
5b39d49358 use scoped package name @overleaf/metrics 2020-08-05 11:27:56 +01:00
Brian Gough
d415ae0cbe remove gruntfile 2020-07-17 16:23:01 +01:00
Brian Gough
a0f856cff2 fix tests 2020-07-17 16:17:18 +01:00
Brian Gough
747a80b545 decaffeinate 2020-07-17 16:01:58 +01:00
Brian Gough
e31a819636 remove statsd 2020-07-17 15:36:37 +01:00
Eric Mc Sween
e197e4dc11 2.7.0 2020-07-06 08:20:02 -04:00
Eric Mc Sween
d69195eaa9 Log requests that don't have a route property
The v1 history service has its routes set up via swagger-tools, which
doesn't write a route property on the request. This prevents us to send
request metrics based on the route, but we can still log the request.
2020-07-03 16:38:29 -04:00
Jakob Ackermann
ea767920fc [misc] fix unit tests 2020-04-01 16:40:15 +02:00
Eric Mc Sween
8131c5ac91 2.6.2 2020-03-25 09:53:54 -04:00
Eric Mc Sween
ef7e3b0e7a Fix HTTP request timing metrics
The conversion between high resolution time and milliseconds was
incorrect.
2020-03-25 09:39:21 -04:00
Eric Mc Sween
a17843f3bf 2.6.1 2020-03-20 07:47:05 -04:00
Eric Mc Sween
25448bfef4 Downgrade all request logs to INFO
Commit 9056143fe36f1347a1ff985ef8592a1de7d798dd added logic to log
requests with different error levels depending on the status code. The
intention was to make the 5xx and 4xx requests stand out in Stackdriver.
Unfortunately, this also creates a lot of noise in Sentry since we log
the errors separately from the requests.

This commit brings back the former behaviour of logging all requests at
the INFO level. We can revisit this if we integrate the strategy
implemented in filestore of logging once per request.
2020-03-18 08:41:11 -04:00
Eric Mc Sween
62e82d7469 Send the status code with the HTTP request size metric 2020-03-12 06:24:46 -04:00
Eric Mc Sween
4a92be80ea Send HTTP request size metric
The metric is a "summary" called http_request_size_bytes.
2020-03-11 16:39:49 -04:00
Brian Gough
03e81153db avoid step effects in summary metrics
reduce the window size from 10 minutes to 1 minute,  so that short
spikes do not cause a 10 minute long "table" graph.
2020-03-10 15:01:09 +00:00
Simon Detheridge
74c2afc12d Bump package version 2020-03-04 10:31:12 +00:00
Simon Detheridge
9a8cddbbb6 Don't set UV_THREADPOOL_SIZE if already set 2020-03-03 17:09:35 +00:00
Brian Gough
2273978e7b fix gauge usage 2019-12-16 11:42:10 +00:00
Brian Gough
3a5374c6f9 increase minor version for backwards compatible addition 2019-12-16 10:22:50 +00:00
Brian Gough
93df87eff3 allow options for count 2019-12-16 10:18:30 +00:00
Simon Detheridge
53aa2490f5 Merge pull request #20 from overleaf/spd-metrics-ttl
Add mechanism to expire old prometheus metrics
2019-10-28 17:21:57 +00:00
Simon Detheridge
feecda8ea8 Use map instead of hash for metrics 2019-10-28 14:39:53 +00:00
Simon Detheridge
e0cf10a886 Fix typo, gague -> gauge 2019-10-28 12:34:04 +00:00
Eric Mc Sween
7b7b6d0793 2.3.0 2019-10-25 07:17:43 -04:00
Eric Mc Sween
166211b278 Stackdriver logging
When the environment variable STACKDRIVER_LOGGING is set to true, send
request logs in a format that Stackdriver knows how to interpret. Also,
set the log level accordint to the status code. 4xx responses are logged
as warnings and 5xx responses are logged as errors.
2019-10-24 22:05:12 -04:00
Simon Detheridge
07e4eb4dfb Add mechanism to expire old prometheus metrics
Adds a wrapper around the prometheus client, which keeps track of the
last time a metrics was accessed, and removes old ones once they have
not been accessed for a period of time.
2019-10-23 17:07:45 +01:00
Brian Gough
286eb747ad add status label to gauges 2019-06-06 10:34:55 +01:00
Henry Oswald
816c49daf2 bump metrics to 2.1.2 2019-04-12 12:36:47 +01:00
Henry Oswald
d4faaaa60e use console.log not logger.log
we don't habe verbiouse lossing enabled that early in the apps lifecycle
also best not to require modules before enabling profiler
2019-04-12 12:34:17 +01:00
Henry Oswald
15d14d8e2b add injectMetricsRoute into statsd so it doens't blow up 2019-02-07 09:47:29 +00:00
Henry Oswald
bf18c6e513 MVP for running both statsd and prom side by side
statsd code is from v1.8.1
2019-01-30 11:11:37 +00:00
Henry Oswald
85011ed0e7 add path into inc options 2019-01-28 14:37:54 +00:00
Christopher Hoskin
02907fd2e7 Fix Register II 2019-01-15 16:16:12 +00:00
Christopher Hoskin
20e45b7a2e Fix Register 2019-01-15 16:12:15 +00:00
Christopher Hoskin
d4caa48118 Bump package version 2019-01-15 15:44:01 +00:00
Christopher Hoskin
93bef54c39 Make register public so we can support other servers e.g. hapi 2019-01-15 15:36:35 +00:00
Henry Oswald
dfead32d69 Update package.json 2018-12-13 09:19:05 +00:00
Henry Oswald
1a34f3db1b Update package.json 2018-12-13 08:46:58 +00:00
Henry Oswald
287effb139 use ENABLE_PROFILE_AGENT 2018-12-12 21:17:12 +00:00
Henry Oswald
db4ae84bda require logger 2018-12-12 21:05:33 +00:00
Henry Oswald
a86c4d8abd add profiler 2018-12-12 20:11:40 +00:00
Henry Oswald
9eac49ad84 add some logging in 2018-12-11 16:07:34 +00:00
Henry Oswald
c257482e15 add ENABLE_DEBUG_AGENT and don't require modules unless they are enabled 2018-12-11 15:46:29 +00:00
Christopher Hoskin
6f82309829 Bump version to 2.0.10 2018-12-11 12:52:38 +00:00
Christopher Hoskin
61e6cf0493 Add host label to timing metrics 2018-12-11 12:01:22 +00:00
Henry Oswald
425a6f55ff set build version via env var and bump build version 2018-12-10 22:02:25 +00:00
Henry Oswald
7a227adaeb add if statment around traceagenet and bump to 2.0.8 2018-12-05 13:58:40 +00:00
Henry Oswald
02071584ae bump to 2.0.7 2018-12-05 12:31:16 +00:00
Henry Oswald
b2f49351c0 few tidy up changes 2018-12-05 11:03:40 +00:00
Henry Oswald
93f4a7eeaf small cleanup 2018-12-04 17:01:30 +00:00
Henry Oswald
85d4b03bcb bump to 2.0.6 2018-12-04 16:29:09 +00:00
Henry Oswald
2d4283fdf0 add DEBUG_METRICS env var 2018-12-04 16:20:52 +00:00
Henry Oswald
31fa5cef51 add logging 2018-12-04 16:09:09 +00:00
Henry Oswald
27e6db1e51 inc process_startup on init 2018-12-04 15:57:19 +00:00
Henry Oswald
752541b7f1 remove logging and bump to 2.0.4 2018-11-29 16:10:55 +00:00
Henry Oswald
9a737bef2f bump to 2.0.3 2018-11-28 10:45:11 +00:00
Henry Oswald
917afa3edc have default opts in inc 2018-11-27 16:12:12 +00:00
Henry Oswald
8adfc49af7 v 2.0.1 2018-11-27 15:48:17 +00:00
Henry Oswald
39f924f73c bump to 2.0.0 2018-11-27 15:15:45 +00:00
Henry Oswald
4e370ef24d clean up 2018-11-27 12:07:26 +00:00
Henry Oswald
a8cfa97463 use modified prefix and method name 2018-11-27 11:55:38 +00:00
Henry Oswald
2a6839f48c roll back async method for moment 2018-11-27 11:27:16 +00:00
Henry Oswald
fdd4db25a3 big refactor remoing statsd and converting keys to prom keys 2018-11-27 10:36:57 +00:00
Henry Oswald
b9f3a3f987 improve mongo metrics to be more promethious like 2018-11-26 13:31:21 +00:00
Henry Oswald
4806e6fd87 use lables 2018-11-26 09:46:26 +00:00
Henry Oswald
f7deba6de9 Merge branch 'master' into ho-stackdriver2 2018-11-23 15:29:06 +00:00
Christopher Hoskin
725abdce3b Remove spurious blank lines 2018-11-21 12:42:40 +00:00
Christopher Hoskin
00aaa3f3d4 Merge branch 'master' into ho-stackdriver 2018-11-21 12:41:34 +00:00
Henry Oswald
7f3cd3c77c just do debugging and tracing 2018-11-21 10:30:18 +00:00
Henry Oswald
f63b84983d add count to prom 2018-11-21 10:28:32 +00:00
Henry Oswald
d01ff95b46 bump to 1.9.1 2018-11-21 10:20:58 +00:00
Henry Oswald
32da84a1d7 proper null check 2018-11-21 10:20:33 +00:00
Henry Oswald
70a75113cb refactor key building 2018-11-21 10:15:43 +00:00
Henry Oswald
c36e3d74b5 replace - with _ 2018-11-21 08:57:00 +00:00
Henry Oswald
cf920a86f7 fix metrics.this 2018-11-21 08:50:33 +00:00
Henry Oswald
5599521c09 consolodate on metrics.timing 2018-11-20 21:52:09 +00:00
Henry Oswald
7fc49d1eb5 metrics of different types can clash, share promMetrics 2018-11-20 17:50:54 +00:00
Henry Oswald
bd8fd1141e client -> prom 2018-11-20 17:25:37 +00:00
Henry Oswald
6854a64b73 fix this.key 2018-11-20 17:20:49 +00:00
Henry Oswald
6f708fd26a add summaries 2018-11-20 17:17:02 +00:00
Henry Oswald
5039287ee4 don't default to adding underscore to name 2018-11-20 16:28:36 +00:00
Henry Oswald
816c9348f6 don't put dots in key name 2018-11-20 16:13:40 +00:00
Henry Oswald
f4af82282f use buildKey for prefixing name and host 2018-11-20 16:06:02 +00:00
Henry Oswald
1a3b427315 bump to 1.9.0 2018-11-20 14:54:09 +00:00
Henry Oswald
c576a86c14 Update metrics.coffee 2018-11-20 13:50:04 +00:00
Christopher Hoskin
65fccf8abe Ensure gauge values are numeric, not string 2018-11-07 16:08:31 +00:00
Christopher Hoskin
d0e1324dba Sanitize metric keys for Prometheus 2018-11-07 12:44:10 +00:00
Christopher Hoskin
bb3cff5744 Add prom-client to package.json 2018-11-07 11:21:45 +00:00
Christopher Hoskin
56eaae89f9 Add Gauges 2018-11-06 16:15:41 +00:00
Christopher Hoskin
6d4d05957e Get counter metric working 2018-11-06 14:22:03 +00:00
Christopher Hoskin
f604fb92e5 Get default Prometheus metrics working 2018-11-06 11:14:26 +00:00
Henry Oswald
7e359c9df2 add trace and debug agent to metrics 2018-10-16 17:19:21 +01:00
Henry Oswald
4b075db038 non tested promethius counters added 2018-10-16 16:47:12 +01:00
Christopher Hoskin
90b6e5afce Release version v1.8.1 2018-07-12 11:07:43 +01:00
Christopher Hoskin
06e8450694 Allow remote statsd to be specified by environment variable 2018-07-06 13:57:52 +01:00
Brian Gough
00fdea922d allow a global gauge not specific to a host 2018-05-18 15:09:11 +01:00
Brian Gough
c8cc1e1bfe handle undefined logger in event loop monitor 2018-05-10 10:10:34 +01:00
Shane Kilkelly
d0eaa235a3 bump package version 2017-03-23 15:32:12 +00:00
Shane Kilkelly
6dbcc34df6 reduce UV thread-pool size to 16 2017-03-23 15:31:05 +00:00
Shane Kilkelly
7bdba3756b Don't bother checking for error properties on error object 2017-03-21 14:19:12 +00:00
Shane Kilkelly
f2ebdd1662 Increment stats for success/failure of wrapped async calls 2017-03-20 16:54:40 +00:00
Shane Kilkelly
1f77cc0fd3 If function is called without callback, apply with original args 2017-03-20 16:37:53 +00:00
Shane Kilkelly
fbe19cd97d Don't return early in case where last arg is not a function 2017-03-20 16:25:10 +00:00
Shane Kilkelly
31235beee5 Don't throw error if the function is not invoked with callback.
Instead, log the error and return early.
2017-03-20 16:17:22 +00:00
Shane Kilkelly
9846703be5 Bump version 2017-03-20 10:17:09 +00:00
Shane Kilkelly
be2c6a96af Log args and their indexes, if they look like object ids 2017-03-17 14:20:13 +00:00
Shane Kilkelly
e7f71a25d8 Use an explicit prefix 2017-03-17 11:47:38 +00:00
Shane Kilkelly
f397678589 Clean up, don't allocate an extra date 2017-03-16 15:07:25 +00:00
Shane Kilkelly
5ea83947dd remove stray compiled js file 2017-03-16 10:10:40 +00:00
Shane Kilkelly
1f9d4950a2 Update gitignore 2017-03-16 10:09:16 +00:00
Shane Kilkelly
772f950d7c Test failure when wrapped method is not async/callback 2017-03-16 10:07:52 +00:00
Shane Kilkelly
a5aec5b812 Test failure to wrap method 2017-03-16 10:02:19 +00:00
Shane Kilkelly
40b238271d Add tests for methods producing errors, and logger 2017-03-16 09:55:07 +00:00
Shane Kilkelly
c3b18618bf Add unit tests 2017-03-16 09:49:45 +00:00
Shane Kilkelly
e99a7f6a87 refactor 2017-03-15 16:07:36 +00:00
Shane Kilkelly
3cb0ab2784 Add a 'timeAsyncMethod' helper 2017-03-15 15:06:54 +00:00
James Allen
caeac717fc Set UV_THREADPOOL_SIZE to 128 for all processes 2016-10-24 10:50:44 +01:00
Brian Gough
2df5845444 updated version to 1.5.0 2016-03-17 09:40:40 +00:00
Brian Gough
622bbe3123 return timeSpan from timers
to allow additional calculation
2016-03-15 13:52:40 +00:00
Brian Gough
29177f8de8 add support for statsd count method 2016-03-15 13:52:32 +00:00
James Allen
8db30020ae Monitor event loop by looking for skew
If we monitor with setImmediate, we miss big blocking loops. For example,
suppose we have 1000 1ms loops then a single bad 1000ms loop. setImmediate
will only be called at the right time 1/1000 of the time (it has to be the
loop just before the bad one). So this monitoring method gives a good average
if the std dev is low, but doesn't pick up spikes.

Instead, we can monitor the skew from the expected time between setIntervals.
In the case above, with a setInterval for 1000ms, we will pick up a skew
proportional to the amount of time that it overlaps the bad loop. So 50%
change of picking up skew > 500ms, and thus getting a good sense of any
spikes.
2015-12-03 16:32:20 +00:00
James Allen
738363a6de Set maxSockets to Infinity for all services 2015-08-31 14:02:03 +01:00
Brian Gough
0215b12a5f log memory usage every minute 2015-08-20 17:03:58 +01:00
Brian Gough
f237f7c3cc remove bug in optional argument handling 2015-08-18 11:23:10 +01:00
Brian Gough
27c382416a fix whitespace 2015-08-17 15:19:13 +01:00
Brian Gough
e3e8d80466 remove non-working metrics from graphite 2015-08-17 15:19:07 +01:00
Brian Gough
d2af7b24a0 remove randomisation to avoid shutdown problems 2015-08-17 15:18:18 +01:00
Brian Gough
577a3759c0 bugfix for memory chunk size 2015-08-14 15:44:24 +01:00
Brian Gough
46ec20ef9c add memory check and periodic gc 2015-08-14 14:38:24 +01:00
Brian Gough
175e3efd5f update package version to 1.2.0 2015-08-06 09:40:58 +01:00
Brian Gough
349b499f85 add compatibility with v2 mongo driver 2015-07-30 08:57:43 +01:00
Brian Gough
ffa523bced added monitoring of event loop time
should indicate if node is blocking on libuv threads
as described in https://nodejs.org/api/dns.html#dns_dns_lookup
2015-06-23 10:51:48 +01:00
Henry Oswald
9329249bc9 Revert "reduce memory capture in http logger"
This reverts commit fc2e043b20204e04f240814d4efc05762db7df96.

Had to revert this because req.route.path is not set until a matching
route has been hit, so it was always null inside res.end meaning
statsd data was never sent over.

This commit did not actually stop the memory leak so reverting it
has not short term repocusion
2015-05-14 16:14:24 +01:00
Brian Gough
1e0a991fcd reduce memory capture in http logger
only capture the properties of 'req' that we need, to avoid leaking
the whole req object for responses that never call res.end()
2015-05-05 10:50:59 +01:00
Brian Gough
fa7e068ebb update minor version, due to addition of new close() method 2015-01-05 16:46:44 +00:00
Brian Gough
30070f23b8 add a close() method to terminate the module cleanly
closes the statsd connection and cancels registered interval timers
prevents express from hanging when trying to shutdown
2015-01-05 16:45:32 +00:00
James Allen
60857982b6 Create LICENSE 2014-09-08 09:19:39 +01:00
James Allen
9acca85843 Release version 1.0.0 2014-08-19 13:32:41 +01:00
Henry Oswald
c8fae26995 changed type to query_type as it conflicts with logstash 2014-05-13 15:24:23 +01:00
James Allen
1dea55d8f2 Check that req.route.path is set 2014-05-12 15:28:09 +01:00
James Allen
284d8fb588 Namespace gauge keys correctly 2014-05-09 13:54:33 +01:00
James Allen
00c0036ca5 Add open socket monitoring 2014-05-09 13:30:12 +01:00
James Allen
86f220a2d2 Add collection into graphite key 2014-05-07 11:50:09 +01:00
James Allen
45ec60d8a6 Fix broken key building in timer 2014-05-07 11:43:46 +01:00
James Allen
873297b02e Namespace http request keys 2014-05-07 11:13:21 +01:00
James Allen
befb48a20f Use correct key in Metrics.timing 2014-05-07 11:08:46 +01:00
James Allen
f4895fb04f Add in http monitoring 2014-05-07 10:58:52 +01:00
James Allen
4da7fa43fa Scope name argument properly 2014-05-06 17:33:09 +01:00