Commit graph

24 commits

Author SHA1 Message Date
Jakob Ackermann
aa812a066a Merge pull request #11296 from overleaf/jpa-pdf-caching-ignore-not-found
[clsi] ignore file not found error when unlinking stale chunk

GitOrigin-RevId: d7b78f7c2773102d7133f710ad7851542417b472
2023-01-18 15:08:20 +00:00
Jakob Ackermann
1a434558d7 Merge pull request #10239 from overleaf/jpa-pdf-caching-backend-tweaks
[clsi] pdf-caching: emit partial set of processed ranges on timeout

GitOrigin-RevId: 0038b5a30ac33fcdcab523d7ddc72fea9f2f5be9
2022-11-02 09:04:43 +00:00
Jakob Ackermann
956cacaef7 Merge pull request #10139 from overleaf/jpa-split-test-min-chunk-size
[misc] add split test for a per request pdfCachingMinChunkSize

GitOrigin-RevId: 6a8a3c6267501789f2047a67b03db6ac6df427c3
2022-10-26 08:03:39 +00:00
Jakob Ackermann
18768d3f1b Merge pull request #8950 from overleaf/bg-use-qpdf-xref-table
[clsi] parse xref table from qpdf output

GitOrigin-RevId: cc400c65082c7c8c28c21037b97b4fde66a85835
2022-08-03 08:04:10 +00:00
Jakob Ackermann
dd5128c1d7 Merge pull request #8952 from overleaf/jpa-pdf-caching-tweaks
[clsi] server side pdf caching tweaks

GitOrigin-RevId: 758cbcc45b5a7ca0fe3dbf31bc43d9b0ef36e599
2022-07-21 08:04:41 +00:00
Jakob Ackermann
ac4d56403a Merge pull request #8933 from overleaf/jpa-pdf-caching-dark
[misc] enable pdf caching in dark mode for 1MB chunks

GitOrigin-RevId: 1d0f7eadeff14a047548b4778d4789829bad5d48
2022-07-21 08:03:13 +00:00
Jakob Ackermann
97624d0c6c Merge pull request #8847 from overleaf/jpa-emit-start-of-xref-table
[clsi] try to emit the start of the xref table

GitOrigin-RevId: 6d8348a349572cc997ac5924664428228c00fed1
2022-07-18 08:04:17 +00:00
Jakob Ackermann
56a3b0dcde Merge pull request #4819 from overleaf/jpa-pdf-caching-new-event-loop
[clsi] pdf-caching: move cpu intensive work onto a new event loop

GitOrigin-RevId: 4cb5cd4528fa1c5df6a8e91f9caa38cb64d94463
2021-08-25 08:03:38 +00:00
Jakob Ackermann
f285e503b4 [misc] run format_fix and lint:fix 2021-07-13 12:04:48 +01:00
Jakob Ackermann
64551f0198 [misc] switch from settings-sharelatex to @overleaf/settings 2021-07-12 17:47:21 +01:00
Jakob Ackermann
743bfced64 [misc] ContentCacheManager: apply review feedback
- count stages
- lower bound is 1s

Co-Authored-By: Brian Gough <brian.gough@overleaf.com>
2021-06-23 14:42:39 +01:00
Jakob Ackermann
b09e52510f [misc] bail out from pdf caching processing after 10s or earlier
...for fast compiles.
2021-06-23 14:20:04 +01:00
Jakob Ackermann
294088fb27 [ContentCacheManager] use PDF.js Xref table instead of stream detection (#242)
* make the content cache manager tests configurable

* extend stream content in unit tests

* [ContentCacheManagerTests] prepare for full object caching

* filesystem stream for pdfjs

* working??

* cleaning up

* handle overflow

* [misc] install pdfjs-dist

* [misc] move pdfjs code into app/lib/ and scripts/, also use CamelCase

* [misc] abstract the file loading and parsing of xRef tables into helper

* [misc] pdfjsTests: add snapshot based tests for the Xref table parser

* [misc] FSStream: throw proper error and drop commented code

* [misc] FSStream: integrate throwing of MissingDataException into getter

* [misc] pdfjs: fix eslint errors

* [misc] pdfjs: run format_fix

* [misc] pdfjs: allocate very small non empty dummy buffers explicitly

* [misc] install @overleaf/o-error

* [ContentCacheManager] use PDF.js Xref table instead of stream detection

Co-Authored-By: Brian Gough <brian.gough@overleaf.com>

* [pdfjs] parseXrefTable: handle empty PDF files gracefully

Co-authored-by: Brian Gough <brian.gough@overleaf.com>
2021-05-31 09:20:25 +01:00
Jakob Ackermann
f361cdfaca [ContentCacheManager] deeply integrate the HashFileTracker
- Update the tracker contents as we hash ranges
- Let the tracker be authoritative for already written ranges
  This is saving a stat call per newly written range.
2021-05-18 18:15:58 +01:00
Jakob Ackermann
7aeeb5a5a9 [ContentCacheManager] finish tracking of ranges across builds 2021-05-18 18:06:15 +01:00
Brian Gough
d493238eaf wip expire old hash files 2021-05-18 16:25:24 +01:00
Jakob Ackermann
23dd93ae50 Merge pull request #236 from overleaf/jpa-skip-duplicate-writes
[ContentCacheManager] skip writing of duplicate streams
2021-05-18 11:44:23 +02:00
Jakob Ackermann
e787106eed Merge pull request #235 from overleaf/jpa-atomic-writes
[ContentCacheManager] write streams to disk atomically
2021-05-18 11:44:16 +02:00
Jakob Ackermann
9b8763aed4 Merge pull request #234 from overleaf/jpa-stream-detection-across-chunks
[ContentCacheManager] add support for stream detection across chunks
2021-05-18 11:44:09 +02:00
Jakob Ackermann
bc1ed82c6c [ContentCacheManager] skip writing of duplicate streams 2021-05-18 10:22:05 +01:00
Jakob Ackermann
224ae0c254 [ContentCacheManager] add support for stream detection across chunks
Retain a small part (6 or 9 bytes) of each chunk in memory for providing
 the next iteration with enough context for finding the start/end marker
 of a range.
2021-05-17 17:36:05 +01:00
Jakob Ackermann
6b9c8bced6 [ContentCacheManager] write streams to disk atomically
Use an intermediate file for writing to disk, then rename to the target.
2021-05-17 14:22:37 +01:00
Brian Gough
ff2175e727 add validation for express :content_id parameter 2021-05-13 15:48:17 +01:00
Jakob Ackermann
b456ea726d [misc] merge pdf caching into main (#226)
* wip generate directory for hash content

* cleanup, remove console logging

* add content caching module

* Return PDF stream ranges with compile response

* Return the PDF file size in the compile response

* PDF range endpoint

* [misc] WIP: pdf caching: preserve the m-time on static content files

* [misc] WIP: pdf caching: improve browser caching, emit caching headers

* [misc] WIP: pdf caching: do not emit very small chunks <1kB

* [misc] keep up with moving output files into a separate directory

* [OutputCacheManager] add global feature flag for enabling pdf caching

* [misc] add contentId into the URL for protecting PDF stream contents

* [misc] support PDF stream caching for anonymous users

* [misc] add per-request feature flag for enabling PDF stream caching

* [misc] enable pdf caching in CI and emit metrics at the end of run

* [misc] expose compile stats and timings to the frontend

* [misc] log an error in case saving output files fails

* [misc] add metrics for pdf bandwidth and pdf caching performance

* [misc] add a dark mode to the pdf caching for computing ranges only

* [misc] move pdf caching metrics into ContentCacheMetrics

* [misc] add a config option for the min chunk size of pdf ranges

Co-authored-by: Brian Gough <brian.gough@overleaf.com>
Co-authored-by: Eric Mc Sween <eric.mcsween@overleaf.com>
2021-05-13 14:07:54 +01:00