MultiValued Field Extension for RegexTransformer

in #solr6 years ago

Solr has different kinds of transformers which are used while importing data to the search engine. For the whole list of built in transformers, you can take a look at transformer wiki

For the focus of this post, we will talk about RegexTransformer which uses patterns to processes the incoming data. It is quite useful when you need to process the data before indexing, but it has a shortcoming, multivalued fields are not supported.

For instance, assuming that you have a field for storing document's language. For single values, it is pretty straightforward. But what if we need multiple values for this field? You might have some serialized format in yout db, and you may need to extract the language related info. For instance, you may have sth like

t:9:{u:5;s:7:"fr";k:1;y:2:"jp";k:2;y:2:"bg";}

With the build in transformer, you can get "fr,jp,bg" into a single field. But what if you need to put each of these fields into a multiValued field?

Solution is extending RegexTransformer to support it ;) You can find the code of the extension

Once you get the code and build the jar, simply add the below property to your entity tag in your import script:

transformer="com.hcetavaj.MultiValuedRegex"

As this is an extension to RegexTransformer, all of the properties of it support. To enable multiValued field support, simply add the below property to your field mapping:

multiRegex="pattern comes here"

Then simply start importing and watch ;)

Sort:  

Congratulations @stephanruhl! You received a personal award!

Happy Birthday! - You are on the Steem blockchain for 2 years!

Click here to view your Board

Support SteemitBoard's project! Vote for its witness and get one more award!

Congratulations @stephanruhl! You received a personal award!

Happy Birthday! - You are on the Steem blockchain for 3 years!

You can view your badges on your Steem Board and compare to others on the Steem Ranking

Vote for @Steemitboard as a witness to get one more award and increased upvotes!

Coin Marketplace

STEEM 0.17
TRX 0.15
JST 0.028
BTC 57574.67
ETH 2368.94
USDT 1.00
SBD 2.42