mirror of
				https://github.com/Brandon-Rozek/website.git
				synced 2025-10-31 13:51:13 +00:00 
			
		
		
		
	New posts
This commit is contained in:
		
							parent
							
								
									a02606f3c8
								
							
						
					
					
						commit
						87992450c3
					
				
					 2 changed files with 123 additions and 0 deletions
				
			
		
							
								
								
									
										58
									
								
								content/blog/blogroll-from-subscriptions.md
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										58
									
								
								content/blog/blogroll-from-subscriptions.md
									
										
									
									
									
										Normal file
									
								
							|  | @ -0,0 +1,58 @@ | |||
| --- | ||||
| title: "Blogroll From Subscriptions" | ||||
| date: 2022-12-18T13:05:49-05:00 | ||||
| draft: false | ||||
| tags: [] | ||||
| math: false | ||||
| --- | ||||
| 
 | ||||
| While I was browsing around personal websites, I found a fun little piece of code from Jake Bauer's [links page](https://www.paritybit.ca/links).  | ||||
| 
 | ||||
| ```bash | ||||
| grep "xmlUrl" static/subscriptions.opml |\ | ||||
| sed 's/.*text=\"\(.*\)\" xmlUrl=\"\(https\?:\/\/[^\/]*\/\)\(.*\)\" .*/<li><a href=\"\2\">\1<\/a> (<a href=\"\2\3\">feed<\/a>)<\/li>/g' | ||||
| ``` | ||||
| 
 | ||||
| This takes the subscriptions exported from [yarr](https://github.com/nkanaev/yarr) and generates a HTML list which you can then include in a blogroll page. | ||||
| 
 | ||||
| Running this script on my export from [Feedbin](https://feedbin.com/) yielded some extra metadata being shown in the HTML.  For example: `Joke Bauer type="rss" (Feed)`. So let's edit the code snippet above so that it works for my subscription export. | ||||
| 
 | ||||
| From my `subscriptions.xml` here's an example entry: | ||||
| 
 | ||||
| ```xml | ||||
| <outline text="Jake Bauer" title="Jake Bauer" type="rss" xmlUrl="https://www.paritybit.ca/feed.xml" htmlUrl="https://www.paritybit.ca/"/> | ||||
| ``` | ||||
| 
 | ||||
| It looks like I need to extract the `title`, `xmlUrl`, and `htmlUrl` attributes in that specific order. I'll use the same technique from a previous post on [capturing quoated strings](/capturing-quoted-string-sed). | ||||
| 
 | ||||
| ```bash | ||||
| grep "xmlUrl" subscriptions.xml |\ | ||||
| sed 's/.*title=\"\([^\"]*\)\".*xmlUrl=\"\([^\"]*\)\".*htmlUrl=\"\([^\"]*\)\".*/<li><a href=\"\3\">\1<\/a> (<a href=\"\2\">feed<\/a>)<\/li>/g' | ||||
| ``` | ||||
| 
 | ||||
| We can then clean this up into a `create_blogroll` script saved within [`~/.local/bin`](/blog/customexec/). | ||||
| 
 | ||||
| ```bash | ||||
| #!/bin/sh | ||||
| 
 | ||||
| set -o errexit | ||||
| set -o nounset | ||||
| 
 | ||||
| show_usage() { | ||||
|     echo "Usage: create_blogroll [subscriptions.xml]" | ||||
|     exit 1 | ||||
| } | ||||
| 
 | ||||
| # Check argument count | ||||
| if [ "$#" -ne 1 ]; then | ||||
|     show_usage | ||||
| fi | ||||
| 
 | ||||
| QUOTED_STR="\"\([^\"]*\)\"" | ||||
| XML_EXPR=".*title=$QUOTED_STR.*xmlUrl=$QUOTED_STR.*htmlUrl=$QUOTED_STR.*" | ||||
| HTML_EXPR="<li><a href=\"\3\">\1<\/a> (<a href=\"\2\">feed<\/a>)<\/li>" | ||||
| REPLACE_EXPR="s/$XML_EXPR/$HTML_EXPR/g" | ||||
| 
 | ||||
| grep "xmlUrl" "$1" | sed "$REPLACE_EXPR" | ||||
| ``` | ||||
| 
 | ||||
							
								
								
									
										65
									
								
								content/blog/capturing-quoted-string-sed.md
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										65
									
								
								content/blog/capturing-quoted-string-sed.md
									
										
									
									
									
										Normal file
									
								
							|  | @ -0,0 +1,65 @@ | |||
| --- | ||||
| title: "Capturing Quoted Strings in Sed" | ||||
| date: 2022-12-18T12:55:32-05:00 | ||||
| draft: false | ||||
| tags: [] | ||||
| math: false | ||||
| --- | ||||
| 
 | ||||
| *Disclaimer: This posts assumes some knowledge about regular expressions.* | ||||
| 
 | ||||
| Recently I was trying to capture an HTML attribute in `sed`. For example, let's say I want to extract the `href` attribute in the following example: | ||||
| 
 | ||||
| ``` | ||||
| <a href="https://brandonrozek.com" rel="me"></a> | ||||
| ``` | ||||
| 
 | ||||
| Advice you commonly see on the Internet is to use a capture group for anything between the quotes of the href. | ||||
| 
 | ||||
| In regular expression land, we can represent anything as `.*` and define a capture group of some regular expression `X` as `\(X\)`. | ||||
| 
 | ||||
| ```bash | ||||
| sed "s/.*href=\"\(.*\)\".*/\1/g" | ||||
| ``` | ||||
| 
 | ||||
| What does this look like for our input? | ||||
| 
 | ||||
| ```bash | ||||
| echo \<a href=\"https://brandonrozek.com\" rel=\"me\"\>\</a\> |\ | ||||
| sed "s/.*href=\"\(.*\)\".*/\1/g" | ||||
| ``` | ||||
| 
 | ||||
| ``` | ||||
| https://brandonrozek.com" rel="me | ||||
| ``` | ||||
| 
 | ||||
| It matches all the way until the second `"`! What we want, is to not match *any* character within the quotations, but match any character that is not the quotation itself `[^\"]*` | ||||
| 
 | ||||
| ```bash | ||||
| sed "s/.*href=\"\([^\"]*\)\".*/\1/g" | ||||
| ``` | ||||
| 
 | ||||
| This then works for our example: | ||||
| 
 | ||||
| ```bash | ||||
| echo \<a href=\"https://brandonrozek.com\" rel=\"me\"\>\</a\> |\ | ||||
| sed "s/.*href=\"\([^\"]*\)\".*/\1/g" | ||||
| ``` | ||||
| 
 | ||||
| ``` | ||||
| https://brandonrozek.com | ||||
| ``` | ||||
| 
 | ||||
| Within a bash script, we can make this a little more readable by using multiple variables. | ||||
| 
 | ||||
| ```bash | ||||
| QUOTED_STR="\"\([^\"]*\)\"" | ||||
| BEFORE_TEXT=".*href=$QUOTED_STR.*" | ||||
| AFTER_TEXT="\1" | ||||
| REPLACE_EXPR="s/$BEFORE_TEXT/$AFTER_TEXT/g" | ||||
| 
 | ||||
| INPUT="\<a href=\"https://brandonrozek.com\" rel=\"me\"\>\</a\>" | ||||
| 
 | ||||
| echo "$INPUT" | sed "$REPLACE_EXPR" | ||||
| ``` | ||||
| 
 | ||||
		Loading…
	
	Add table
		Add a link
		
	
		Reference in a new issue