Commit graph

336 commits

Author SHA1 Message Date
Brian Gough
d0e08039da don't modify expiry for temporary packs 2016-04-07 15:16:50 +01:00
Brian Gough
fd49601716 preserve existing history when user upgrades 2016-04-07 15:16:38 +01:00
Brian Gough
e292de5eb0 fix to avoid ever appending permanent changes to expiring packs 2016-04-06 17:00:16 +01:00
Brian Gough
8b7bdd345b consider all packs for archiving 2016-04-06 15:17:31 +01:00
Brian Gough
ef47337c78 remove additional fields 2016-04-06 15:17:20 +01:00
Brian Gough
0b9a0730c0 mark temporary packs with a last_checked date in the far future
they do not need to be checked for archiving
2016-04-06 14:29:49 +01:00
Brian Gough
08fc151eee avoid unnecessary call to insert packs into index 2016-04-06 14:29:21 +01:00
Brian Gough
719e0291aa consider all packs for processing
to allow finalisation of old head packs
2016-04-06 14:27:44 +01:00
Brian Gough
79baa99634 clean up logging 2016-04-06 14:26:54 +01:00
Brian Gough
6ab75795a2 archive head packs after sufficient time 2016-04-06 13:30:09 +01:00
Brian Gough
6e18d49736 support archiving from list of project_ids/doc_ids 2016-04-04 17:00:19 +01:00
Brian Gough
31348141d8 increase logging for discarded updates and version mismatch 2016-03-24 11:55:29 +00:00
Brian Gough
181cebecef avoid call to fetch packs unnecessarily 2016-03-24 11:55:29 +00:00
Brian Gough
98683de3ae temporarily disable ttl behaviour
allow existing packs without temporary flag to expire
2016-03-24 11:38:59 +00:00
Brian Gough
3f388fb0ac only change ttl on cached packs, not temporary ones
temporary = without versioning feature enabled
cached = permanent versioned retrieved from s3
2016-03-24 11:38:09 +00:00
Brian Gough
8d900013d9 record whether a pack is temporary in the pack itself
using the expiresAt field no longer determines if the pack is
temporary because archived packs have an expiresAt field added when
they are retrieved from s3
2016-03-24 11:02:58 +00:00
Brian Gough
98738d1344 fix for acceptance test 2016-03-10 15:15:29 +00:00
Brian Gough
f01bf99682 acceptance tests - work in progress 2016-03-09 16:56:49 +00:00
Brian Gough
f6367e21b8 give separate error for archive in progress vs completed 2016-03-09 14:44:59 +00:00
Brian Gough
7350ab531d exclude already cached packs from archival 2016-03-09 14:44:59 +00:00
Brian Gough
28b184e0ca fix incorrect use of _.union (argument must be array) 2016-03-09 14:44:59 +00:00
Brian Gough
8922b97bd7 avoid duplicate filling of UserInfo in getDocUpdates 2016-03-09 14:44:59 +00:00
Brian Gough
7e6ea2793b remove startup dependency on s3 settings 2016-03-09 13:28:02 +00:00
Brian Gough
1419d20b1f fix indentation 2016-03-04 15:43:32 +00:00
Brian Gough
3175f6d3a6 handle case where index does not exist 2016-03-03 14:36:16 +00:00
Henry Oswald
e8b3fb5be6 added more logging to failed health checks 2016-03-03 10:50:55 +00:00
Brian Gough
795f717bab added index definitions 2016-03-01 11:38:23 +00:00
Brian Gough
3d9dfeccc3 remove pack worker
remove the op-specific code

remove tests for ops, now only packing

remove unused packing code

work in progress

store index for completed packs only

support archiving and unarchiving of individual packs

remove support for archiving whole document history

split out ArchiveManager, IndexManager

remove old DocArchive code

remove docHistoryStats collection

comment about archiving

added method to look at index when last pack has been archived

added start of iterator for project results

use a proper iterator

added heap module

getting it working

increase pack size since bulk operations no longer needed

remove unused MongoAWSexternal

cleanup

added doc iterator

remove old query code

added missing files

cleanup

clean upclean up

started adding pack worker for archiving

work in progress

work in progress

getting pack worker working

updating worker

getting packworker working

added lock

use correct key name for track changes aws access

use correct key name for track changes aws access

always send back users array

fix up comparison of retrieved objects

handle op ids inside packs

log when s3 download completes

comments

cleanup, remove finalisation ideacleanup, remove finalisation idea

remove logging
2016-03-01 10:10:02 +00:00
Brian Gough
a23ddf31c0 allow packing of temporary ops 2016-01-29 12:36:03 +00:00
Brian Gough
77cafa36af support continuing from last packed doc 2016-01-28 16:40:20 +00:00
Brian - Work
666a07e5ba move lock check into HealthChecker
to avoid dependency of HttpController on LockManager in unit tests
2016-01-27 16:04:55 +00:00
Brian Gough
199d2aaa92 script to pack existing docs 2016-01-27 15:14:23 +00:00
Brian Gough
b44a7b9aa6 reject very large ops 2016-01-26 14:52:40 +00:00
Brian Gough
b7a4c72f9c avoid compressing updates if the result would be too big 2016-01-26 12:23:21 +00:00
Brian Gough
ed0aaa189d add test for non-overlapping insert-delete case 2016-01-26 12:13:43 +00:00
Brian Gough
b3ddd839e6 add logging of raw updates 2016-01-26 11:28:02 +00:00
Brian Gough
29c7c5e249 enable packs by default for new docs 2016-01-25 09:55:55 +00:00
Brian Gough
d10123d3c4 include n parameter when packing 2016-01-25 09:45:25 +00:00
Brian Gough
9b2cd11cd4 don't try to append to packs when using the old op code 2016-01-22 10:45:24 +00:00
Brian Gough
84ace7f4c7 use packs only for temporary ops 2016-01-20 14:22:13 +00:00
Brian Gough
78b3412ca8 decrease delay when packing 2016-01-19 15:58:09 +00:00
Brian Gough
679a81564e respect mongo 3 limit of 1000 bulk operations 2016-01-19 15:58:09 +00:00
Brian Gough
f592611cac always create a new pack, never keep as op 2016-01-19 15:58:09 +00:00
Brian Gough
c6be12f3d5 set v_end on pack creation 2016-01-19 15:58:09 +00:00
Brian - Work
f64969c784 added comment about query memory usage for toArray 2016-01-19 15:58:09 +00:00
Brian Gough
0532a4daaa use compound index to replace separate index for packs 2016-01-19 15:56:09 +00:00
Brian Gough
0ba00a9eb7 expire temporary packs and roll over to a new pack each day 2016-01-19 15:56:09 +00:00
Brian Gough
5e830cbbdb put all new ops into packs 2016-01-19 15:56:09 +00:00
Brian Gough
dc564fd5d0 archiving document history now sends all changes to s3 2016-01-15 15:54:46 +00:00
Brian Gough
5153ed8217 make peekLastUpdate alway return lastVersion when available 2016-01-15 15:54:44 +00:00
Brian Gough
8e53d66079 log the key for lock timeouts 2016-01-12 10:47:15 +00:00
Brian Gough
6199532d08 increase logging on s3 operations 2016-01-12 10:36:00 +00:00
Brian Gough
ca1f1dc944 handle exception in parsing retrieved json from aws 2016-01-12 09:26:29 +00:00
Brian Gough
b8862ca5af switch to node-byline module to avoid buffering problem with readline-stream
for lines > 64k the readline-stream module is affected by
https://github.com/jahewson/node-byline/issues/30 which is fixed in
node-byline (readline-stream was an earlier fork of the byline module)
2016-01-11 16:51:35 +00:00
Brian Gough
cb109a27a6 allow PackWorker to shut down cleanly 2016-01-06 09:43:10 +00:00
Brian Gough
ffe30962c9 add a close() method to LockManager to allow clean shutdown 2016-01-06 09:34:39 +00:00
Brian Gough
05163837cb add sentry error reporting to PackWorker 2016-01-05 16:00:52 +00:00
Brian Gough
6754bdca1c log timestamp in human-readable form for inconsistent ops 2016-01-05 11:30:24 +00:00
Brian Gough
e1aa436286 respect mongo bulk operations limit of 1000 operations 2016-01-05 11:13:13 +00:00
Brian Gough
bb7153c6c1 workaround for mongojs db.close issue
https://github.com/mafintosh/mongojs/issues/224
2015-12-22 15:36:15 +00:00
Brian Gough
d3583b4ef6 respect limit of 1000 ops in bulk operation with mongojs 1.x 2015-12-22 14:38:04 +00:00
Brian Gough
c7b4062412 remove unsupported options argument in count() method of mongojs 1.x 2015-12-22 14:20:34 +00:00
Brian Gough
d49997d9f3 fix usage of BSON module 2015-12-21 16:56:49 +00:00
Brian Gough
b7de6f2f71 don't try to compress updates across point of broken history 2015-12-21 13:52:26 +00:00
Brian Gough
4a6374efe8 fix read order when retrieving diffs 2015-12-18 12:38:42 +00:00
Brian Gough
9f69c95192 Merge branch 'upgrade-mongojs' 2015-12-17 16:31:04 +00:00
Brian Gough
4a82dfe618 add setting trackchanges.continueOnError to allow recovery from missing ops 2015-12-17 16:28:02 +00:00
Brian Gough
b84a9e6e91 upgrade mongojs 2015-12-17 14:11:44 +00:00
Brian Gough
54d1036e37 skip ops marked as broken in database 2015-12-09 15:13:37 +00:00
Brian Gough
2a7c33d7ca added /check endpoint for documents 2015-12-09 14:57:04 +00:00
Brian Gough
23c43b8042 skip any broken ops when viewing history diffs 2015-12-04 15:17:28 +00:00
Brian Gough
be2136de7c fix update-in-place bug for array ops 2015-12-04 15:17:28 +00:00
Brian Gough
3842f0d1cc Merge pull request #9 from sharelatex/only-delete-applied-ops
Only delete applied ops
2015-11-27 12:45:51 +00:00
Brian Gough
8ebc069ddb modify last compressed op in place 2015-11-26 16:17:18 +00:00
Brian Gough
3432d9e91a added comments for redis delete 2015-11-26 15:16:54 +00:00
Brian Gough
e65549099c only delete the applied ops from redis 2015-11-25 16:01:07 +00:00
Brian Gough
992857d6a2 added redis write check to healthcheck 2015-10-29 10:52:23 +00:00
Brian Gough
c44d5b1b3d added healthcheck 2015-10-19 10:59:39 +01:00
Brian Gough
ad144371d0 gracefully handle updates marked as broken
set update.broken == true to allow the user to view history without a
crash
2015-10-16 11:24:50 +01:00
Brian Gough
8961e23954 enhance LockManager to avoid accidental unlocking 2015-10-14 14:42:17 +01:00
Brian Gough
b6dae59655 fix callback logic in compressAndSaveRawUpdates 2015-10-08 16:39:13 +01:00
Brian Gough
8226bf3be4 increase lock time to 5 minutes 2015-10-08 16:11:39 +01:00
Brian Gough
add6a68fe1 add missing callback in compressAndSaveRawUpdates 2015-10-08 10:53:25 +01:00
James Allen
1a4b8f4269 API/service layout deprecation warning 2015-10-07 13:44:40 +01:00
James Allen
2a03591030 Stub out noisy/slow logger-sharelatex and mongojs modules in tests 2015-09-25 13:46:20 +01:00
James Allen
23dfe68cb8 Don't error when rewinding and insert op which is beyond the length of the document.
ShareJS will accept an op where p > content.length when applied,
and it applies as though p == content.length. However, the op is
passed to us with the original p > content.length. Detect if that
is the case with this op, and shift p back appropriately to match
ShareJS if so.
2015-09-25 13:44:44 +01:00
Brian Gough
92e0b0f04c add logging to each stage of archiving 2015-09-24 09:10:06 +01:00
Brian Gough
e683b0275a bug fix for clear archive in progress flag 2015-09-24 09:09:49 +01:00
Brian Gough
692e8c657c Revert to the default lock timeout now we have write barriers
Revert "increase lock timeouts for archiving"

This reverts commit 9eee1b383772adf058130d6e5eab409f57ce03cd.
2015-09-24 08:53:09 +01:00
Brian Gough
2ab1778dd9 move default value of lastVersion into function body 2015-09-23 16:31:33 +01:00
Brian Gough
dc0044020f only archive entries older than the current update
to avoid a stale version of the current update ever being pulled back
from S3
2015-09-23 14:33:40 +01:00
Brian Gough
696a866b67 pause the stream of ops, not the download
the download is buffered in the lineStream so a lot comes out even
after pausing the S3 download.
2015-09-23 13:38:57 +01:00
Brian Gough
847a553344 prevent double archiving by checking if any inS3 field is already present 2015-09-23 13:29:32 +01:00
Brian Gough
e49f260507 allow rollback/locking by setting inS3:false when starting the archive process 2015-09-23 13:28:07 +01:00
Brian Gough
551e8334cf compressedUpdates are now never inserted with inS3
it is now always added later, and a new update is forced for any
addition to an archived update
2015-09-23 13:25:10 +01:00
Brian Gough
d6b827426c support forcing new compressed update in popLastCompressedUpdate
callback with a null update, passing the version as an additional
argument
2015-09-23 13:22:38 +01:00
Brian Gough
a10dc4f898 Merge pull request #6 from heukirne/s3-archive
Add S3 archive track changes feature
2015-09-21 11:25:06 +01:00
Brian Gough
0e627c92d8 avoid clobbering global _ in loop 2015-09-18 16:26:05 +01:00
Henrique Dias
aa66c5ee8c improve size function 2015-09-17 10:41:53 -03:00
Henrique Dias
3f712c452a add size bulk limit 2015-09-17 09:23:13 -03:00
Brian Gough
7af5050370 add lock to unarchive doc 2015-09-16 16:18:36 +01:00
Brian Gough
18f06a3daf increase lock timeouts for archiving 2015-09-16 16:09:38 +01:00
Brian Gough
b4ffa7d57e share the document lock between archiving and packing 2015-09-16 16:03:55 +01:00
Brian Gough
9d39012b49 add error handler to each stage of download pipeline 2015-09-16 16:00:37 +01:00
Brian Gough
d9085a5e5e add error handler to each stage of upload pipeline 2015-09-16 16:00:25 +01:00
Brian Gough
1c1b1d9595 log the case where there are no entries in the document history 2015-09-16 15:34:30 +01:00
Brian Gough
82d0f4fce8 make unarchive more responsive by downloading documents in parallel
unarchive is triggered interactively so we should try to make it
reasonably fast
2015-09-16 15:33:59 +01:00
Brian Gough
dfa0036507 pause stream while writing to mongo 2015-09-16 15:32:36 +01:00
Brian Gough
70200a9cf1 only log document ids, not document content
avoid filling the log with large documents
2015-09-16 15:31:43 +01:00
Brian Gough
d3dff28bea Merge remote-tracking branch 'origin/master' into heukirne-s3-archive 2015-09-15 15:19:43 +01:00
Brian Gough
092f98d3ad suppress error in normal shutdown case 2015-09-12 11:07:54 +01:00
Shane Kilkelly
eab8b4b6c8 Null safe access of id property, needed as user can be null. 2015-09-11 14:07:06 +01:00
Shane Kilkelly
0ad374556d Add a comment for clarity. 2015-09-10 16:43:40 +01:00
Shane Kilkelly
8387383cb4 In _summarizeUpdates, allow null users through.
A null value represents a deleted or otherwise missing user record.
2015-09-10 14:32:47 +01:00
Shane Kilkelly
810bddb2cb Log a message when the web api produces a 404 response. 2015-09-10 14:32:35 +01:00
Shane Kilkelly
522786d45e Produce a null value, rather than crashing when the user info service returns 404. 2015-09-09 15:48:22 +01:00
Henry Oswald
18d817ee0a added some missing error handling 2015-09-08 16:33:45 +01:00
Henry Oswald
17b0d99a65 rework the archiveDocChangesWithLock function
make it a bit more readable for me, struggle to trust indentation
based calls in coffeescript
2015-09-08 16:26:01 +01:00
Henry Oswald
0b3ebcff06 remove if statments checking if s3 is a backend
if its not enable then it can crash. In prod it should always be there
or not used at all
2015-09-08 16:23:15 +01:00
Henrique Dias
c5a8a249c6 add unarchive acceptance tests 2015-09-03 08:36:32 -03:00
Henrique Dias
da9e7dc7e1 init archive acceptance tests 2015-09-02 18:47:34 -03:00
Henrique Dias
d2b1243701 split MongoAWS files 2015-09-02 15:45:29 -03:00
Henrique Dias
1abcea1a66 add some unit test 2015-08-31 18:13:18 -03:00
Henrique Dias
efff026a79 handle easier propagation 2015-08-25 16:52:28 -03:00
Henrique Dias
f910e63e90 fix null case 2015-08-24 12:22:17 -03:00
Henrique Dias
fcbe4aa925 fix inS3 propagation 2015-08-24 12:19:19 -03:00
Henrique Dias
1ccba422c8 remove unused function 2015-08-24 10:55:27 -03:00
Henrique Dias
98ce03b2f2 replace docs collection to DocstoreHandler 2015-08-24 10:38:31 -03:00
Henrique Dias
04ec45529f restore updates from S3 when exists
fix: avoid rearchiving
2015-08-18 17:11:19 -03:00
Henrique Dias
20c3e15f93 fix bulk insert limit 2015-08-14 19:58:38 -03:00
Henrique Dias
26c8048729 change mongo stream method (still have a bug in bulk insert limit) 2015-08-14 19:19:54 -03:00
Henrique Dias
fd4afb3574 Archive changes, care about: versioin, expiresAt and Lock 2015-08-14 15:07:16 -03:00
Henrique Dias
6bc9c9010a handle auto unarchive track changes 2015-08-09 19:52:32 -03:00
Henrique Dias
3bc5380468 handle inS3 flag 2015-08-09 17:50:15 -03:00
Henrique Dias
daa42bcea0 change s3Stream lib 2015-08-09 15:47:47 -03:00
Henrique Dias
bca48ac117 add unarchive doc track from s3 2015-08-06 17:09:36 -03:00
Henrique Dias
438c4f4d0c using mongoexport for s3 archive 2015-08-06 15:46:44 -03:00
Henrique Dias
028fe2fa03 archive docChanges list to s3 2015-08-06 11:11:43 -03:00
Henrique Dias
ae047ecf76 init s3 feature 2015-08-06 10:00:09 -03:00
Brian Gough
775f5ebbe1 add configurable limit, delay and timeout to /pack via query string 2015-06-05 13:38:47 +01:00
Brian Gough
23d2518ebb added a /pack endpoint to fork a pack worker 2015-06-04 16:36:56 +01:00
Brian Gough
289616d3db added a /doc/:doc_id/pack endpoint 2015-06-04 16:23:49 +01:00
Brian Gough
3f2e4b0c11 move pack script functionality into PackManager 2015-06-04 16:21:40 +01:00
Brian Gough
27a3511b37 update docHistoryStats after packing ops 2015-06-03 10:36:07 +01:00
Brian Gough
19d812734e make PackManager parameters configurable 2015-06-03 10:35:36 +01:00
Brian Gough
66bca8d05c include the current date in the updates to docHistoryStats 2015-05-26 11:00:55 +01:00
Brian Gough
1811ac2145 added support for cleaning old expired ops in packs 2015-05-22 15:40:02 +01:00
Brian Gough
5c4afd5303 add docHistoryStats collection to keep track of updates to docs 2015-05-22 15:40:01 +01:00
Brian Gough
78f0bdbae3 fix name of temporary parameter to match other methods 2015-05-22 15:40:01 +01:00
Brian Gough
adc2866a7d add check to exclude temporary ops from packs 2015-05-22 15:40:01 +01:00