Regular Expression Documentation

Regular Expression Documentation

Regular expressions (regex or regexp) are sequences of characters that define a search pattern. They are an incredibly powerful tool for manipulating text, used extensively in programming, data analysis, and text editing. In the context of MediaWiki, understanding regular expressions is crucial for advanced search and replace operations, template creation, and data validation. This article provides a comprehensive introduction to regular expressions within the MediaWiki environment, focusing on practical applications and beginner-friendly explanations.

1. Why Use Regular Expressions in MediaWiki?

MediaWiki’s built-in search and replace functionality, while useful for simple tasks, is limited. Regular expressions unlock a much broader range of capabilities, including:

**Complex Search:** Finding patterns that aren't literal strings, such as all dates in a specific format, email addresses, or URLs.
**Advanced Replacement:** Modifying text based on patterns, such as changing the format of dates, adding or removing HTML tags, or correcting common spelling errors.
**Data Validation:** Ensuring that user input or data within wiki pages conforms to specific rules. This is particularly useful in Form extensions or when creating complex templates.
**Template Development:** Creating more dynamic and flexible templates that can adapt to different input data. Understanding regular expressions can enable more sophisticated Template logic.
**Automated Tasks:** Using tools like AutoWikiBrowser with regex capabilities to perform large-scale edits consistently and efficiently.
**Parsing WikiText:** Extracting specific information from wiki markup, although this is more complex and often requires external tools in conjunction with MediaWiki’s API.

1. Basic Concepts & Syntax

Let's start with the fundamental building blocks of regular expressions.

1. 1. Literals

The simplest regex consists of literal characters. For example, the regex `hello` will match the exact string "hello". Case sensitivity is generally important; `Hello` will *not* match. MediaWiki's search functions may offer case-insensitive options (see section on MediaWiki Specifics).

1. 1. Metacharacters

Metacharacters are special characters that have a specific meaning in regular expressions. They allow you to define more complex patterns. Here are some of the most common:

`.` (Dot): Matches any single character except a newline.
`^` (Caret): Matches the beginning of a string or line.
`$` (Dollar): Matches the end of a string or line.
`*` (Asterisk): Matches the preceding character zero or more times.
`+` (Plus): Matches the preceding character one or more times.
`?` (Question Mark): Matches the preceding character zero or one time (optional).
`[]` (Square Brackets): Defines a character class, matching any single character within the brackets. For example, `[aeiou]` matches any vowel.
`()` (Parentheses): Creates a capturing group, which allows you to extract specific parts of the matched text.
`|` (Pipe): Represents the "or" operator, matching either the expression before or after the pipe.
`\` (Backslash): Escapes a metacharacter, treating it as a literal character. For example, `\.` matches a literal dot.

1. 1. Character Classes

Character classes allow you to match a set of characters.

`[abc]`: Matches 'a', 'b', or 'c'.
`[a-z]`: Matches any lowercase letter.
`[A-Z]`: Matches any uppercase letter.
`[0-9]`: Matches any digit.
`[^abc]`: Matches any character *except* 'a', 'b', or 'c'. The `^` inside square brackets negates the character class.
`\d`: Matches any digit (equivalent to `[0-9]`).
`\w`: Matches any word character (letters, numbers, and underscore – equivalent to `[a-zA-Z0-9_]`).
`\s`: Matches any whitespace character (space, tab, newline).
`\D`: Matches any non-digit character.
`\W`: Matches any non-word character.
`\S`: Matches any non-whitespace character.

1. 1. Quantifiers

Quantifiers specify how many times a preceding character or group should be matched.

`{n}`: Matches exactly *n* times. For example, `\d{3}` matches exactly three digits.
`{n,}`: Matches *n* or more times. For example, `\d{2,}` matches two or more digits.
`{n,m}`: Matches between *n* and *m* times (inclusive). For example, `\d{1,3}` matches one to three digits.
`*?`: Non-greedy version of `*`. Matches as few times as possible.
`+?`: Non-greedy version of `+`. Matches as few times as possible.
`??`: Non-greedy version of `?`. Matches as few times as possible.

1. 1. Grouping and Capturing

Parentheses `()` are used to group parts of a regex and also to *capture* the matched text. Captured groups can be referred to later in the regex (backreferences) or extracted for use in replacements.

`(abc)` captures the string "abc".
`\1` refers to the first captured group. `\2` refers to the second, and so on.

1. Practical Examples

Let's illustrate these concepts with some practical examples relevant to MediaWiki editing.

1. **Finding all dates in the format YYYY-MM-DD:** `\d{4}-\d{2}-\d{2}` 2. **Finding all email addresses:** `\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b` (This is a simplified email regex; a truly robust one is very complex.) 3. **Replacing all occurrences of "old text" with "new text" (case-insensitive):** This requires MediaWiki's specific flags (see below). The pattern would be `old text` and the replacement `new text`. 4. **Removing all HTML tags:** `<.*?>` (Use with caution; this is a very broad pattern and may remove unintended content.) 5. **Extracting the title from a URL:** If you have URLs like `https://en.wikipedia.org/wiki/Regular_expression`, you could use `https?://(?:www\.)?([^/]+)` to capture the title "Regular expression" in the first capturing group. 6. **Replacing multiple spaces with a single space:** `\s+` replaced with ` ` (a single space). 7. **Converting Markdown bolding to Wiki markup:** `\*\*(.*?)\*\*` replaced with `$1` (using capturing groups). This takes text enclosed in double asterisks and converts it to wiki bolding. 8. **Finding all instances of a specific category:** `\[\[Category:[^\]]+\]\]`

1. MediaWiki Specifics

MediaWiki’s regex engine is based on PCRE (Perl Compatible Regular Expressions), but there are some important considerations:

**Search and Replace:** The "Search & Replace" functionality in the editing interface uses regular expressions when the "Regular expression" checkbox is checked.
**Flags:** MediaWiki offers several flags to modify regex behavior:

   *   `$1`, `$2`, etc.: Backreferences to captured groups in the replace text.
   *   `\u`: Converts the next character to uppercase.
   *   `\l`: Converts the next character to lowercase.
   *   `(?i)`:  Enables case-insensitive matching.  Place this at the beginning of the regex. For example: `(?i)old text`
   *   `(?s)`:  Allows the dot (`.`) to match newline characters.

**Limitations:** MediaWiki’s regex engine may have limitations compared to full PCRE implementations. Very complex regexes might not work as expected.
**Performance:** Complex regular expressions can be computationally expensive. Avoid overly complex patterns if possible, especially when performing large-scale edits. Consider using simpler patterns combined with multiple passes if necessary.
**Escaping:** Be mindful of escaping special characters correctly. The backslash `\` is used for escaping, but it may need to be escaped itself in certain contexts (e.g., within strings in a template).

1. Testing and Debugging

Testing your regular expressions is crucial before applying them to a large number of pages. Here are some useful tools:

**Regex101:** [1](https://regex101.com/) – An online regex tester with detailed explanations and debugging features. Allows you to select the PCRE flavor.
**Regexr:** [2](https://regexr.com/) – Another online regex tester with a visual interface.
**MediaWiki Sandbox:** Create a sandbox page in your wiki and test your regexes there before applying them to live pages.
**Small-Scale Testing:** Start with a small sample of pages and carefully review the results before performing a large-scale edit.

1. Advanced Topics

Once you’re comfortable with the basics, you can explore these more advanced topics:

**Lookarounds:** Assertions that match a position based on what precedes or follows it, without including those characters in the match. (e.g., positive lookahead, negative lookbehind).
**Backreferences and Named Capturing Groups:** More sophisticated ways to refer to captured groups.
**Conditional Expressions:** Using conditional logic within your regex.
**Recursion:** Matching nested structures.
**Unicode Support:** Handling Unicode characters correctly.

1. Resources and Further Learning

**Regular-Expressions.info:** [3](https://www.regular-expressions.info/) – A comprehensive online resource for learning about regular expressions.
**PCRE Documentation:** [4](https://www.pcre.org/) – The official documentation for PCRE.
**Wikipedia's Regular Expression Article:** [expression]
**Stack Overflow:** [5](https://stackoverflow.com/questions/tagged/regex) – A valuable resource for finding answers to specific regex questions.
**Regex Golf:** [6](https://www.regexgolf.com/) - A fun site to test your regex skills and see how concisely you can solve problems.

1. Related Articles

1. Trading & Financial Resources (Disclaimer: Not Wiki-Related)

These links are provided for informational purposes only and are not endorsed by the MediaWiki project. Trading involves risk.

**Technical Analysis Basics:** [7](https://www.investopedia.com/terms/t/technicalanalysis.asp)
**Moving Averages Explained:** [8](https://www.investopedia.com/terms/m/movingaverage.asp)
**Bollinger Bands:** [9](https://www.investopedia.com/terms/b/bollingerbands.asp)
**Fibonacci Retracements:** [10](https://www.investopedia.com/terms/f/fibonacciretracement.asp)
**MACD Indicator:** [11](https://www.investopedia.com/terms/m/macd.asp)
**RSI Indicator:** [12](https://www.investopedia.com/terms/r/rsi.asp)
**Candlestick Patterns:** [13](https://www.investopedia.com/terms/c/candlestickpattern.asp)
**Support and Resistance Levels:** [14](https://www.investopedia.com/terms/s/supportandresistance.asp)
**Trend Lines:** [15](https://www.investopedia.com/terms/t/trendline.asp)
**Chart Patterns:** [16](https://www.investopedia.com/terms/c/chartpattern.asp)
**Elliott Wave Theory:** [17](https://www.investopedia.com/terms/e/elliottwavetheory.asp)
**Ichimoku Cloud:** [18](https://www.investopedia.com/terms/i/ichimoku-cloud.asp)
**Volume Analysis:** [19](https://www.investopedia.com/terms/v/volume.asp)
**Head and Shoulders Pattern:** [20](https://www.investopedia.com/terms/h/headandshoulders.asp)
**Double Top/Bottom:** [21](https://www.investopedia.com/terms/d/doubletop.asp)
**Cup and Handle Pattern:** [22](https://www.investopedia.com/terms/c/cupandhandle.asp)
**Flag and Pennant Patterns:** [23](https://www.investopedia.com/terms/f/flagpattern.asp)
**Wedge Pattern:** [24](https://www.investopedia.com/terms/w/wedge.asp)
**Harmonic Patterns:** [25](https://www.investopedia.com/terms/h/harmonic-pattern.asp)
**Gartley Pattern:** [26](https://www.investopedia.com/terms/g/gartley-pattern.asp)
**Butterfly Pattern:** [27](https://www.investopedia.com/terms/b/butterfly-pattern.asp)
**Crab Pattern:** [28](https://www.investopedia.com/terms/c/crab-pattern.asp)
**Bat Pattern:** [29](https://www.investopedia.com/terms/b/bat-pattern.asp)
**Bearish/Bullish Engulfing:** [30](https://www.investopedia.com/terms/e/engulfingpattern.asp)

Start Trading Now

Sign up at IQ Option (Minimum deposit $10) Open an account at Pocket Option (Minimum deposit $5)

Join Our Community

Subscribe to our Telegram channel @strategybin to receive: ✓ Daily trading signals ✓ Exclusive strategy analysis ✓ Market trend alerts ✓ Educational materials for beginners

Regular Expression Documentation

Start Trading Now

Join Our Community

Navigation menu