Regular expression

Regular Expressions: A Beginner's Guide

Regular expressions (often shortened to "regex" or "regexp") are sequences of characters that define a search pattern. They are an incredibly powerful tool for manipulating text. While they might appear intimidating at first, understanding the basics can significantly improve your ability to search, extract, and modify text data. This article will provide a comprehensive introduction to regular expressions, geared towards beginners, and specifically applicable within the context of MediaWiki editing and beyond.

1. What are Regular Expressions Used For?

Regular expressions are used in a vast array of applications, including:

**Text Searching:** Finding specific patterns within large bodies of text. Consider searching for all email addresses in a document, or all instances of a particular word.
**Text Validation:** Ensuring that user input conforms to a specific format, such as email addresses, phone numbers, or postal codes. This is crucial for data integrity.
**Text Extraction:** Pulling specific data from text, like extracting all dates from a news article.
**Text Replacement:** Replacing occurrences of a pattern with another string. Useful for renaming variables in code or standardizing text formatting.
**Data Cleaning:** Removing unwanted characters or formatting from data.
**Parsing:** Breaking down complex text into manageable components.
**Programming Languages:** Regex support is built into most modern programming languages (Python, JavaScript, Java, PHP, etc.), making them invaluable for developers.
**Text Editors:** Many text editors (like VS Code, Sublime Text, Notepad++) have built-in regex search and replace functionality.
**MediaWiki Editing:** Within MediaWiki, regular expressions can be used for complex search and replace operations, especially when dealing with large amounts of wikitext. You'll find them particularly useful in advanced searches and when maintaining templates.

1. Basic Regex Components

Let's break down the fundamental building blocks of regular expressions.

1. 1. Literals

The simplest regex consists of literal characters. For example, the regex `cat` will match the exact string "cat". This seems trivial, but it's the foundation upon which more complex patterns are built.

1. 1. Metacharacters

Metacharacters are special characters that have a specific meaning in regular expressions. They allow you to define more complex patterns. Here are some of the most common:

`.` (Dot): Matches any single character except a newline.
`^` (Caret): Matches the beginning of a string or line.
`$` (Dollar): Matches the end of a string or line.
`*` (Asterisk): Matches the preceding character zero or more times.
`+` (Plus): Matches the preceding character one or more times.
`?` (Question Mark): Matches the preceding character zero or one time.
`[]` (Square Brackets): Defines a character class, matching any single character within the brackets.
`()` (Parentheses): Groups characters together and captures the matched text.
`\` (Backslash): Escapes a metacharacter, treating it as a literal character. For example, `\.` will match a literal dot.
`|` (Pipe): Acts as an "or" operator, matching either the expression before or after the pipe.

1. 1. Character Classes

Character classes allow you to match any one of a set of characters.

`[abc]`: Matches either 'a', 'b', or 'c'.
`[a-z]`: Matches any lowercase letter.
`[A-Z]`: Matches any uppercase letter.
`[0-9]`: Matches any digit.
`[a-zA-Z0-9]`: Matches any alphanumeric character.
`[^abc]`: Matches any character *except* 'a', 'b', or 'c'. The `^` inside square brackets negates the character class.

1. 1. Quantifiers

Quantifiers specify how many times a preceding character or group should be matched.

`a*`: Matches 'a' zero or more times (e.g., "", "a", "aa", "aaa").
`a+`: Matches 'a' one or more times (e.g., "a", "aa", "aaa").
`a?`: Matches 'a' zero or one time (e.g., "", "a").
`a{n}`: Matches 'a' exactly *n* times. For example, `a{3}` matches "aaa".
`a{n,}`: Matches 'a' *n* or more times. For example, `a{2,}` matches "aa", "aaa", "aaaa", etc.
`a{n,m}`: Matches 'a' between *n* and *m* times (inclusive). For example, `a{2,4}` matches "aa", "aaa", and "aaaa".

1. 1. Anchors

Anchors don't match characters themselves, but rather positions within the string.

`^`: Matches the beginning of the string. For example, `^hello` matches strings that start with "hello".
`$`: Matches the end of the string. For example, `world$` matches strings that end with "world".
`\b`: Matches a word boundary. For example, `\bword\b` matches the word "word" but not "sword" or "wordy".

1. 1. Grouping and Capturing

Parentheses `()` are used to group parts of a regular expression. This serves two primary purposes:

**Grouping:** Allows you to apply quantifiers to entire groups of characters. For example, `(ab)+` matches "ab", "abab", "ababab", etc.
**Capturing:** Captures the matched text within the group, allowing you to access it later. This is useful for extracting specific parts of a string. Captured groups are numbered starting from 1.

1. Regular Expressions in MediaWiki

MediaWiki 1.40 offers regex capabilities through its search and replace features, particularly when using the `replace` function in Lua modules or within the manual search and replace tool. The specific syntax and features available may vary depending on the context. Generally, MediaWiki uses PCRE (Perl Compatible Regular Expressions), which is a widely supported and powerful regex engine.

1. 1. Example: Finding and Replacing Links

Let's say you want to replace all instances of a specific external link with a different one. You could use a regex like this:

``` \[https?:\/\/example\.com\/page1\](.*?)\[\/link\] ```

This regex would:

`\[`: Match a literal opening square bracket.
`https?:\/\/`: Match "http://" or "https://". The `?` makes the 's' optional.
`example\.com\/page1`: Match the specific URL. The `\.` matches a literal dot.
`\]`: Match a literal closing square bracket.
`(.*?)`: Capture the link text within the brackets. `.` matches any character, `*` matches zero or more times, and `?` makes the quantifier non-greedy (matching the shortest possible string).
`\[\/link\]`: Match the closing link tag.

You could then replace this entire pattern with:

``` \[https?:\/\/newexample\.com\/newpage1\]$1\[\/link\] ```

`$1` refers to the first captured group (the link text).

1. 1. Example: Removing Specific HTML Tags

You might want to remove unwanted HTML tags from wikitext. For example, to remove all `
` tags:

``` <br\s*\/?> ```

`<br`: Matches the opening `
` tag.
`\s*`: Matches zero or more whitespace characters.
`\/?: Matches an optional forward slash.
`>`: Matches the closing `>` character.

1. 1. Using Regex in Advanced Searches

MediaWiki's advanced search functionality allows you to use regular expressions in your search queries. You'll typically need to enable the "Regular expression" option in the search interface. This allows you to search for patterns more flexibly than with simple keyword searches.

1. Advanced Concepts

1. 1. Backreferences

Backreferences allow you to refer to previously captured groups within the same regular expression. This is useful for finding repeated patterns. For example, `(\w+)\s+\1` matches a word followed by whitespace and then the same word again (e.g., "hello hello"). `\1` refers to the first captured group.

1. 1. Lookarounds

Lookarounds are zero-width assertions that match a position in the string based on what precedes or follows it, without including the preceding or following characters in the match.

**Positive Lookahead:** `X(?=Y)` matches 'X' only if it's followed by 'Y'.
**Negative Lookahead:** `X(?!Y)` matches 'X' only if it's *not* followed by 'Y'.
**Positive Lookbehind:** `(?<=Y)X` matches 'X' only if it's preceded by 'Y'.
**Negative Lookbehind:** `(?<!Y)X` matches 'X' only if it's *not* preceded by 'Y'.

1. 1. Non-Greedy Matching

By default, quantifiers like `*` and `+` are greedy, meaning they match as much as possible. Adding a `?` after the quantifier makes it non-greedy, matching as little as possible. For example, `a.*b` matches "a anything b", while `a.*?b` matches "a the shortest possible string b".

1. Resources for Learning More

**Regular-Expressions.info:** [1](https://www.regular-expressions.info/) - A comprehensive guide to regular expressions.
**Regex101:** [2](https://regex101.com/) - An online regex tester and debugger.
**Regexr:** [3](https://regexr.com/) - Another online regex tester.
**PCRE Documentation:** [4](https://www.pcre.org/) - Documentation for Perl Compatible Regular Expressions.
**Help:Search**: MediaWiki's help page on searching, including information on regular expressions.
**Help:Editing**: MediaWiki's help page on editing, which touches upon search and replace.
**Lua**: Learn how to use Lua modules for more complex regex operations within MediaWiki.

1. Staying Updated on Trading Trends

Understanding market dynamics is crucial for successful trading. Here are some resources related to technical analysis and market trends:

**Moving Averages:** [5](https://www.investopedia.com/terms/m/movingaverage.asp)
**Fibonacci Retracement:** [6](https://www.investopedia.com/terms/f/fibonacciretracement.asp)
**Bollinger Bands:** [7](https://www.investopedia.com/terms/b/bollingerbands.asp)
**MACD (Moving Average Convergence Divergence):** [8](https://www.investopedia.com/terms/m/macd.asp)
**RSI (Relative Strength Index):** [9](https://www.investopedia.com/terms/r/rsi.asp)
**Trend Lines:** [10](https://www.investopedia.com/terms/t/trendline.asp)
**Support and Resistance Levels:** [11](https://www.investopedia.com/terms/s/supportandresistance.asp)
**Candlestick Patterns:** [12](https://www.investopedia.com/terms/c/candlestickpattern.asp)
**Elliott Wave Theory:** [13](https://www.investopedia.com/terms/e/elliottwavetheory.asp)
**Ichimoku Cloud:** [14](https://www.investopedia.com/terms/i/ichimoku-cloud.asp)
**Volume Analysis:** [15](https://www.investopedia.com/terms/v/volume.asp)
**Technical Indicators:** [16](https://www.investopedia.com/terms/t/technicalindicators.asp)
**Market Sentiment:** [17](https://www.investopedia.com/terms/m/marketsentiment.asp)
**Head and Shoulders Pattern:** [18](https://www.investopedia.com/terms/h/headandshoulders.asp)
**Double Top/Bottom:** [19](https://www.investopedia.com/terms/d/doubletop.asp)
**Triangles (Ascending, Descending, Symmetrical):** [20](https://www.investopedia.com/terms/t/triangle.asp)
**Flag and Pennant Patterns:** [21](https://www.investopedia.com/terms/f/flagandpennant.asp)
**Gap Analysis:** [22](https://www.investopedia.com/terms/g/gap.asp)
**Divergence (in technical analysis):** [23](https://www.investopedia.com/terms/d/divergence.asp)
**Harmonic Patterns:** [24](https://www.investopedia.com/terms/h/harmonic-patterns.asp)
**Average True Range (ATR):** [25](https://www.investopedia.com/terms/a/atr.asp)
**Parabolic SAR:** [26](https://www.investopedia.com/terms/p/parabolicsar.asp)
**Donchian Channels:** [27](https://www.investopedia.com/terms/d/donchian-channels.asp)
**Keltner Channels:** [28](https://www.investopedia.com/terms/k/keltnerchannels.asp)

Regular expressions are a powerful tool that takes time and practice to master. Don't be discouraged if you don't grasp everything immediately. Start with the basics, experiment with different patterns, and use online resources to help you learn. With practice, you'll be able to leverage the power of regular expressions to solve a wide range of text-processing tasks.

Help:Editing, Help:Search, Lua, Manual:Pywikibot, Manual:User rights, MediaWiki, Template, Module, Extension, API, Advanced searches

Start Trading Now

Sign up at IQ Option (Minimum deposit $10) Open an account at Pocket Option (Minimum deposit $5)

Join Our Community

Subscribe to our Telegram channel @strategybin to receive: ✓ Daily trading signals ✓ Exclusive strategy analysis ✓ Market trend alerts ✓ Educational materials for beginners [[Category:]]

Regular expression

Start Trading Now

Join Our Community

Navigation menu