Regex Filter#
Filters the text columns of a dataset by matching a list of regular expressions.
Interfaces: Transformer
Data Type Compatibility: Categorical
Parameters#
| # | Name | Default | Type | Description | 
|---|---|---|---|---|
| 1 | patterns | array | A list of regular expression patterns used to filter the text columns of the dataset. | 
Example#
use Rubix\ML\Transformers\RegexFilter;
$transformer = new RegexFilter([
    RegexFilter::URL,
    RegexFilter::MENTION,
    '/(?<me>.+)/',
    RegexFilter::EXTRA_CHARACTERS,
]);
Predefined Regex Patterns#
| Class Constant | Description | 
|---|---|
| URL | An alias for the default URL matching pattern. | 
| GRUBER_1 | The faster original Gruber URL matching pattern. | 
| GRUBER_2 | The more universal improved Gruber URL matching pattern. | 
| A pattern to match any email address. | |
| MENTION | A pattern that matches Twitter-style mentions (@example). | 
| HASHTAG | Matches Twitter-style hashtags (#example). | 
| EXTRA_CHARACTERS | Matches extra non word or number characters such as repeated punctuation and special characters. | 
| EXTRA_WORDS | Matches extra (consecutively repeated) words. | 
Additional Methods#
This transformer does not have any additional methods.
References:#
  
    
      Last update: 2021-03-03