yt-dlp

mirror of https://github.com/yt-dlp/yt-dlp.git synced 2024-12-19 07:11:55 -05:00

Author	SHA1	Message	Date
Jody Bruchon	a45e861918	Switch from binary search tree to Python sets Signed-off-by: Jody Bruchon <jody@jodybruchon.com>	2020-09-18 21:18:23 -04:00
Jody Bruchon	fd87f42378	Randomize the ArchiveTree the proper Python way Signed-off-by: Jody Bruchon <jody@jodybruchon.com>	2020-09-18 14:22:42 -04:00
Jody Bruchon	2459b6e1cf	Style revisions	2020-09-18 09:35:21 -04:00
Jody Bruchon	a4d834fb3e	Fix wrong variable in position swap corrupting archive list It's always a simple error in the end, you know? Signed-off-by: Jody Bruchon <jody@jodybruchon.com>	2020-09-18 00:11:36 -04:00
Jody Bruchon	fda63a4e87	Randomize archive order before populating search tree This doesn't result in an elegant, perfectly balanced search tree, but it's absolutely good enough. This commit completely mitigates the worst-case scenario where the archive file is sorted. Signed-off-by: Jody Bruchon <jody@jodybruchon.com>	2020-09-17 21:45:40 -04:00
Jody Bruchon	1d74d8d9f6	Try to mitigate the problem of loading a fully sorted archive Sorted archives turn the binary tree into a linked list and make things horribly slow. This is an incomplete mitigation for this issue.	2020-09-17 17:28:22 -04:00
Jody Bruchon	1de7ea76f8	Remove recursion in at_insert()	2020-09-17 15:08:33 -04:00
Jody Bruchon	a5029645ae	Remove debugging print statements	2020-09-17 14:46:11 -04:00
Jody Bruchon	ecdec1913f	Keep download archive in memory for better performance The old behavior was to open and scan the entire archive file for every single video download. This resulted in horrible performance for archives of any remotely large size, especially since all new video IDs are appended to the end of the archive. For anyone who uses the archive feature to maintain archives of entire video playlists or channels, this meant that all such lists with newer downloads would have to scan close to the end of the archive file before the potential download was rejected. For archives with tens of thousands of lines, this easily resulted in millions of line reads and checks over the course of scanning a single channel or playlist that had been seen previously. The new behavior in this commit is to preload the archive file into a binary search tree and scan the tree instead of constantly scanning the file on disk for every file. When a new download is appended to the archive file, it is also added to this tree. The performance is massively better using this strategy over the more "naive" line-by-line archive file parsing strategy. The only negative consequence of this change is that the archive in memory will not be synchronized with the archive file on disk. Running multiple instances of the program at the same time that all use the same archive file may result in duplicate archive entries or duplicated downloads. This is unlikely to be a serious issue for the vast majority of users. If the instances are not likely to try to download identical video IDs then this should not be a problem anyway; for example, having two instances pull two completely different YouTube channels at once should be fine. Signed-off-by: Jody Bruchon <jody@jodybruchon.com>	2020-09-17 14:22:07 -04:00
Tom-Oliver Heidel	acdb1a4ec6	Merge branch 'arbitrary-merges' of https://github.com/fstirlitz/youtube-dlc	2020-09-14 13:04:24 +02:00
felix	d03cfdce1b	Support arbitrary stream merges With this change, the merge operator may join any number of media streams, video or audio. The streams are downloaded in the order specified. Also, fix the metadata post-processor so that it doesn't leave out any streams.	2020-09-14 12:53:09 +02:00
Unknown	f791b41970	flake8	2020-09-13 11:08:02 +02:00
Unknown	57df9f53e0	[youtube] convert subtitles with --skip-download	2020-09-13 11:03:48 +02:00
Unknown	cefecac12c	[skip travis] renaming to avoid using same folder when using pip install for example	2020-09-02 20:25:25 +02:00

14 commits