mirror of
https://github.com/Brandon-Rozek/website.git
synced 2024-11-25 09:36:31 -05:00
New posts
This commit is contained in:
parent
a02606f3c8
commit
87992450c3
2 changed files with 123 additions and 0 deletions
58
content/blog/blogroll-from-subscriptions.md
Normal file
58
content/blog/blogroll-from-subscriptions.md
Normal file
|
@ -0,0 +1,58 @@
|
||||||
|
---
|
||||||
|
title: "Blogroll From Subscriptions"
|
||||||
|
date: 2022-12-18T13:05:49-05:00
|
||||||
|
draft: false
|
||||||
|
tags: []
|
||||||
|
math: false
|
||||||
|
---
|
||||||
|
|
||||||
|
While I was browsing around personal websites, I found a fun little piece of code from Jake Bauer's [links page](https://www.paritybit.ca/links).
|
||||||
|
|
||||||
|
```bash
|
||||||
|
grep "xmlUrl" static/subscriptions.opml |\
|
||||||
|
sed 's/.*text=\"\(.*\)\" xmlUrl=\"\(https\?:\/\/[^\/]*\/\)\(.*\)\" .*/<li><a href=\"\2\">\1<\/a> (<a href=\"\2\3\">feed<\/a>)<\/li>/g'
|
||||||
|
```
|
||||||
|
|
||||||
|
This takes the subscriptions exported from [yarr](https://github.com/nkanaev/yarr) and generates a HTML list which you can then include in a blogroll page.
|
||||||
|
|
||||||
|
Running this script on my export from [Feedbin](https://feedbin.com/) yielded some extra metadata being shown in the HTML. For example: `Joke Bauer type="rss" (Feed)`. So let's edit the code snippet above so that it works for my subscription export.
|
||||||
|
|
||||||
|
From my `subscriptions.xml` here's an example entry:
|
||||||
|
|
||||||
|
```xml
|
||||||
|
<outline text="Jake Bauer" title="Jake Bauer" type="rss" xmlUrl="https://www.paritybit.ca/feed.xml" htmlUrl="https://www.paritybit.ca/"/>
|
||||||
|
```
|
||||||
|
|
||||||
|
It looks like I need to extract the `title`, `xmlUrl`, and `htmlUrl` attributes in that specific order. I'll use the same technique from a previous post on [capturing quoated strings](/capturing-quoted-string-sed).
|
||||||
|
|
||||||
|
```bash
|
||||||
|
grep "xmlUrl" subscriptions.xml |\
|
||||||
|
sed 's/.*title=\"\([^\"]*\)\".*xmlUrl=\"\([^\"]*\)\".*htmlUrl=\"\([^\"]*\)\".*/<li><a href=\"\3\">\1<\/a> (<a href=\"\2\">feed<\/a>)<\/li>/g'
|
||||||
|
```
|
||||||
|
|
||||||
|
We can then clean this up into a `create_blogroll` script saved within [`~/.local/bin`](/blog/customexec/).
|
||||||
|
|
||||||
|
```bash
|
||||||
|
#!/bin/sh
|
||||||
|
|
||||||
|
set -o errexit
|
||||||
|
set -o nounset
|
||||||
|
|
||||||
|
show_usage() {
|
||||||
|
echo "Usage: create_blogroll [subscriptions.xml]"
|
||||||
|
exit 1
|
||||||
|
}
|
||||||
|
|
||||||
|
# Check argument count
|
||||||
|
if [ "$#" -ne 1 ]; then
|
||||||
|
show_usage
|
||||||
|
fi
|
||||||
|
|
||||||
|
QUOTED_STR="\"\([^\"]*\)\""
|
||||||
|
XML_EXPR=".*title=$QUOTED_STR.*xmlUrl=$QUOTED_STR.*htmlUrl=$QUOTED_STR.*"
|
||||||
|
HTML_EXPR="<li><a href=\"\3\">\1<\/a> (<a href=\"\2\">feed<\/a>)<\/li>"
|
||||||
|
REPLACE_EXPR="s/$XML_EXPR/$HTML_EXPR/g"
|
||||||
|
|
||||||
|
grep "xmlUrl" "$1" | sed "$REPLACE_EXPR"
|
||||||
|
```
|
||||||
|
|
65
content/blog/capturing-quoted-string-sed.md
Normal file
65
content/blog/capturing-quoted-string-sed.md
Normal file
|
@ -0,0 +1,65 @@
|
||||||
|
---
|
||||||
|
title: "Capturing Quoted Strings in Sed"
|
||||||
|
date: 2022-12-18T12:55:32-05:00
|
||||||
|
draft: false
|
||||||
|
tags: []
|
||||||
|
math: false
|
||||||
|
---
|
||||||
|
|
||||||
|
*Disclaimer: This posts assumes some knowledge about regular expressions.*
|
||||||
|
|
||||||
|
Recently I was trying to capture an HTML attribute in `sed`. For example, let's say I want to extract the `href` attribute in the following example:
|
||||||
|
|
||||||
|
```
|
||||||
|
<a href="https://brandonrozek.com" rel="me"></a>
|
||||||
|
```
|
||||||
|
|
||||||
|
Advice you commonly see on the Internet is to use a capture group for anything between the quotes of the href.
|
||||||
|
|
||||||
|
In regular expression land, we can represent anything as `.*` and define a capture group of some regular expression `X` as `\(X\)`.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sed "s/.*href=\"\(.*\)\".*/\1/g"
|
||||||
|
```
|
||||||
|
|
||||||
|
What does this look like for our input?
|
||||||
|
|
||||||
|
```bash
|
||||||
|
echo \<a href=\"https://brandonrozek.com\" rel=\"me\"\>\</a\> |\
|
||||||
|
sed "s/.*href=\"\(.*\)\".*/\1/g"
|
||||||
|
```
|
||||||
|
|
||||||
|
```
|
||||||
|
https://brandonrozek.com" rel="me
|
||||||
|
```
|
||||||
|
|
||||||
|
It matches all the way until the second `"`! What we want, is to not match *any* character within the quotations, but match any character that is not the quotation itself `[^\"]*`
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sed "s/.*href=\"\([^\"]*\)\".*/\1/g"
|
||||||
|
```
|
||||||
|
|
||||||
|
This then works for our example:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
echo \<a href=\"https://brandonrozek.com\" rel=\"me\"\>\</a\> |\
|
||||||
|
sed "s/.*href=\"\([^\"]*\)\".*/\1/g"
|
||||||
|
```
|
||||||
|
|
||||||
|
```
|
||||||
|
https://brandonrozek.com
|
||||||
|
```
|
||||||
|
|
||||||
|
Within a bash script, we can make this a little more readable by using multiple variables.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
QUOTED_STR="\"\([^\"]*\)\""
|
||||||
|
BEFORE_TEXT=".*href=$QUOTED_STR.*"
|
||||||
|
AFTER_TEXT="\1"
|
||||||
|
REPLACE_EXPR="s/$BEFORE_TEXT/$AFTER_TEXT/g"
|
||||||
|
|
||||||
|
INPUT="\<a href=\"https://brandonrozek.com\" rel=\"me\"\>\</a\>"
|
||||||
|
|
||||||
|
echo "$INPUT" | sed "$REPLACE_EXPR"
|
||||||
|
```
|
||||||
|
|
Loading…
Reference in a new issue