Problem statement
In my current release of the product we had to do Elasticsearch (ES) upgrade. The product was on ES 1.x and we had to move to ES 6.x. Now, since ES supports upgrade only from previous major version, the only option we had was to do re-indexing of the data. We used following approach to re-index the data:- Implement a java executable jar to directly read the index files of ES 1.x using lucene APIs and write each document to stdout.
- Use logstash to invoke the above jar and read the stdout.
- Create logstash filter configuration to transform the data from ES 1.x into ES 6.x compatible form.
- Once the data is transformed then write it to the ES 6.x instance.
This worked well for us except for #3. The transform configuration written in a configuration file was not readable and maintainable. The editors would interpret the configuration file as plain text. So without any highlighting and indentations it was not readable at all. Also, we had to write transform for almost all the data (events) we had so there was too much of if-elseif-elseif....else. This was not only hampering the readability but also would have impacted the event transformation performance. For every event doing many if-else checks is definitely not good. What we wanted was a clean, readable & maintainable transformation.
Solution
The logstash ruby filter was the way to go. So, I rewrote the entire transformation in ruby script as follows:
- Wrote a single ruby script file for transformation.
- Avoided if-else statements and instead used look ups to determine what transformation needs to be done per event type.
- Invoked the logstash provided filters like mutate, date etc.
- On top of this we also wrote product specific transformations.
- Added logging to the script.
Integrating with maven build
While implementing the transformation in ruby made it more readable and maintainable, there was one thing missing. It was not testable outside logstash. One can write inline test for ruby filter but they have to be executed within logstash. What I wanted was to integrate the tests with our maven build so that if there are any test failures, the build should also fail.
Since I was new to ruby I had to spend some time understanding ruby gems etc. and finally was able to run the ruby transform script in both maven build as well as logstash. This was achieved as follows:
- Copied the logstash deployment folder locally along with our maven project.
- In the maven pom.xml used the exec-maven-plugin to execute the ruby script.
- For the script to run properly, set the paths properly for the jars as well as ruby gems.
- Quickly implemented a small testing framework in ruby which would:
- Scan for test classes.
- Execute the test method. Test methods were supposed to have name starting with test_ e.g. test_event_types
- If the test fails then log the error.
- Execute all the tests and then print summary like total tests executed, tests passed/failed etc.
- If there is even a single test failure then fail the maven build.
- Once integrated we wrote many tests around the transformation per event type.
Implementation
The working sample for above approach is available at: https://github.com/ajeydudhe/java-pocs/tree/master/logstash-ruby-transform