Particle physics - Exercise 1b (Reconstructed event filtering, remembering C++ and VIM)

effofex (62)in #utopian-io • 6 years ago (edited)

Repository

https://github.com/BFuks/mad5-utopian-exercises

Introduction

I have attempted Exercise 1b in @lemouth's ongoing project (roadmap here) wherein we implement a Large Hadron Collider (LHC) particle physics analysis by modifying C⁺⁺ code generated by their MadAnalysis5 toolkit.

(Image is a composite from (left-to-right) 1,2,3 respectively authored by by Energy.gov,Benjamin.doe, & OC and released under public domain, CC3, & CC3.

The specific goals of the exercise were to:

identify baseline particles from a set of reconstructed detector events
remove spurious particles from the identified baselines using a partial implementation of CERN's ATLAS overlap identification criteria.

Additional personal goals were:

Re-familiarize myself with C⁺⁺
- Specifically, learn how C⁺⁺11 differs from the last C⁺⁺ flavor I coded in: pre-std, pre-boost VC⁺⁺ 6.0
Improve my efficiency in VIM
Become comfortable with git branching, forking, tagging, and merging

Post main();

Exercise 1b was more challenging than Exercise 1a, which was the equivalent of 'Hello, world!' for this project. The challenge arose for three reasons.

It required more domain knowledge of particle physics, including understanding a technical document which, to an outsider, was somewhat cryptic.
The algorithm implemented required a moderate degree of care to implement. While no new data structures were required, it was more involved than looping through a vector and writing to the console. It also required some attention to detail to safely alter the existing data structures while iterating.
I am extremely rusty with C⁺⁺ and out of date on the current standards. I have been programming in essentially Python/R for the past 9 years, and my previous experience was on legacy code. Boost was just beginning to become popular the last time I touched C⁺⁺. The implications are that I had to remember a lot of lost knowledge about making the type system happy and had to play catch up on techniques such as the current idioms for filtering vectors.

To reflect on this experience, I'd like to organize this post around three topics; first discussing the physical system, followed by my implementation notes, and finally reflecting on how well I achieved both the exercise and my personal goals.

System analyzed

For those not familiar with it, MadAnalysis5 is a particle physics framework for analyzing reconstructed detector events. The workflow we've been following is to use MadAnalysis5 to generate C⁺⁺ skeleton code which we then modify to analyze simulated LHC detector events from the Public Analysis Database (PAD), specifically the events described by this file.

As I understand it, the physics occurring within a detector cause events requiring the conditioning of raw signals. While @lemouth's given explanation is more canonical than mine, I will attempt to paraphrase to help myself understanding what's going on.

Essentially, collisions of non-elementary particles (such as photons) result in a morass of interactions. This morass is known as a jet and its physical properties give some clues as to what's going on.

What's important for this exercise is to know that not all jets are interesting and that multiple detected jets and other events can have the same 'fingerprint' and thus overlap. To handle the first case, we must generate a baseline list of interesting jets, muons, and electrons. To handle the second case, we must follow a specific order of criteria to 'weed out' overlapping baseline members.

Schematically, we are performing a sequential filtering on data, to arrive at the subset of signal particles illustrated below:

Sets and subsets of reconstructed events for this analysis.

(Image my own, released under CC3)

Baseline identification

The criteria for baseline identification of all three object types are based on threshold physical properties. Specifically, they must have both a transverse momentum above a minimum threshold and a pseudorapidity whose magnitude is below a maximum threshold. The values of those thresholds depend on the object type, as described in the CERN document and the exercise description. Beyond that, it is fairly easy to understand how to apply them, even without insight into why those criteria are useful or how the specific numbers are derived.

Overlap Removal

Although the actual calculation of overlap criteria is not particularly more complex than that for baseline identification (discipline-specific notation differences aside), overlap removal is more difficult to understand. Although I believe have more domain expertise would help, the major confounding factors were as follows:

The combination of overlap criteria is a complex system. There are multiple steps, and each step consists of both a matching criterion and match condition.
For this exercise, we are performing a partial implementation. This simplifies some aspects, but it also introduces ambiguity.
The document itself can be hard to parse (beyond particle physics jargon). As examples, the CERN doc refers to a 'matching requirement', yet the the criteria table (Table 3, p. 15 of the CERN doc) only has 'criteria' and 'condition', leading to some ambiguity. There are other issues as well with the text, but I don't believe it is the authors' fault. Any textual description of these criteria, even with a table, is essentially problematic. More concrete examples, or better yet, a reference implementation, would help. The figure below is my flowchart used to figure out what I believe the correct approach should be for this exercise.

Logical flow of partial overlap removal for this analysis.

(Image my own, released under CC3)

Implementation

Because of the challenges I faced understanding overlap removal, I decided to perform the implementation in two steps. First, I wanted to get the overlap removal implemented. Then, and only then, would I worry about modifying the implementation to use modern C⁺⁺ or refactoring.

First step (Overlap Removal)

It required a few trials to get the overlap removal to a point where I was was satisfied that it was probably correct, and if not correct, then understandably wrong. The code consists of old-style for loops over indices into vectors, with this snippet being representative:

    vector<RecLeptonFormat> baseLineElecs;
    
    // Looking through the reconstructed electron collection
    for (MAuint32 i=0;i<event.rec()->electrons().size();i++)
    {
      const RecLeptonFormat& elec = event.rec()->electrons()[i];
      
      const double momentumBound  = 5; //GeV
      const double psuedoRapidityBound = 2.47; 
      if(momentumBound < elec.pt() && psuedoRapidityBound > abs(elec.eta())){
        baseLineElecs.push_back(elec);
      }
    }

The major issues with this implementation were, as expected, that it didn't take advantage of modern vector operations or refactoring. This lead to it being verbose and brittle. Most of the logic was also tied up in managing details, like indices and safe removal, rather than the 'business logic', making it hard to grasp the code's intentions. The commit which best represents this stage is in my personal repository here.

Second attempt (modern C⁺⁺)

At its heart, this exercise is essentially generating a few vectors and filtering them based on various criteria. Using the operations made available by C⁺⁺11 really helped abstract away many implementation details, making the code much more intentional. For example, the above snippet can be extracted into a method:

void test_analysis::filterBaseLine(const vector<RecLeptonFormat> &from, vector<RecLeptonFormat> &to, 
                    double min_pt, double maxAbsEta){
  copy_if(from.begin(),from.end(),
    back_inserter(to),
        [&min_pt,&maxAbsEta](const RecLeptonFormat& l){return (l.pt() > min_pt ) && (abs(l.eta())< maxAbsEta);});
}

and the entire baseline filtering portion becomes:

   /***************************************************
    * Create baseline vectors
    ***************************************************/
    filterBaseLine(event.rec()->electrons(),elecs,5,2.47);
    filterBaseLine(event.rec()->muons(),muons,4,2.7);
    filterBaseLine(event.rec()->jets(),jets,25,2.5);

Similar benefits were realized with the overlap removal code. Further safety was insured by using the built in remove_if algorithm, rather than managing vector indices. The full version of my code, after the pull request, can be viewed at the main repository (as ex1b-effofex.*) and also on the task 1b branch of my personal repository.

This is the relevant output from my current version:

        => progress: [===>                               ]
Base and signal vector sizes
Particle        Base    Removed Signal
        Jets:   5       0       5
        Elecs:  0       0       0
        Muons:  0       0       0

        => progress: [======>                            ]
Base and signal vector sizes
Particle        Base    Removed Signal
        Jets:   6       0       6
        Elecs:  0       0       0
        Muons:  0       0       0

        => progress: [==========>                        ]
Base and signal vector sizes
Particle        Base    Removed Signal
        Jets:   8       8       0
        Elecs:  2       0       2
        Muons:  1       0       1

        => progress: [=============>                     ]
Base and signal vector sizes
Particle        Base    Removed Signal
        Jets:   8       0       8
        Elecs:  0       0       0
        Muons:  1       1       0

        => progress: [=================>                 ]
Base and signal vector sizes
Particle        Base    Removed Signal
        Jets:   7       0       7
        Elecs:  0       0       0
        Muons:  0       0       0

        => progress: [====================>              ]
Base and signal vector sizes
Particle        Base    Removed Signal
        Jets:   6       0       6
        Elecs:  0       0       0
        Muons:  1       1       0

        => progress: [========================>          ]
Base and signal vector sizes
Particle        Base    Removed Signal
        Jets:   4       0       4
        Elecs:  0       0       0
        Muons:  0       0       0

        => progress: [===========================>       ]
Base and signal vector sizes
Particle        Base    Removed Signal
        Jets:   6       6       0
        Elecs:  1       0       1
        Muons:  0       0       0

        => progress: [===============================>   ]
Base and signal vector sizes
Particle        Base    Removed Signal
        Jets:   8       0       8
        Elecs:  0       0       0
        Muons:  0       0       0

        => progress: [==================================>]
Base and signal vector sizes
Particle        Base    Removed Signal
        Jets:   7       7       0
        Elecs:  1       0       1
        Muons:  0       0       0

        => progress: [===================================]

Goal analysis

I'd like to reflect briefly on the degree to which I achieved the goals for this exercise.

Identify baseline particles from a set of reconstructed detector events

I believe that I'm doing this correctly. It was a good entry into the event record data structures and helped me brush the cobwebs off some basic C⁺⁺ syntax issues.

Overlap removal

I don't know if I did this correctly. I think my approach makes sense, at the very least, I do think it was intelligible enough for @lemouth to tell me what went wrong if there was an error.

Relearning C⁺⁺

This was a big goal for me. I'd forgotten how different it is to program in a more strictly typed, compiled system, after coming from lots of Python and R. While I don't think I'll ever enjoy managing header files, I do have to say that C⁺⁺11 was refreshing. I had a vague idea that iteration through vectors was going to be nicer than I'd remembered, but I was unaware of how much the syntax has improved - especially the auto keyword and the ability to use lambda functions. Although the documentation on some of these was still closer to a man page than a tutorial, it wasn't too rough to learn from.

By far, the bigger obstacle was in remembering things I'd forgotten - both explicit memory management and casting/polymorphism come to mind. In fact, you might notice that I avoided doing anything too ambitious with passing references to vectors around or taking advantage of RecParticleFormat, the base to the two major classes used in the code, to remove some code duplication. I decided not to pursue those right now so that I'd finish in time. It is also for those reasons that I decided to not pursue unit testing, although I was happy to see that cppunit is still around.

VIM and git goals

I've used VIM a bit, but not much and realized that since I'm running this software in non-gui Unix environment (specifically the Windows 10 WSL ), that this would be a good time to brush up. After shaking some cobwebs out, I'm fairly comfortable doing basic navigation and editing tasks. I'm getting better at doing multi-line yanking instead of retyping and have flirted a bit with regexen and multiple buffers. The biggest issue is reflexively hitting Ctrl-Z to undo - this kicks me out of VIM and creates a .swp file. I plan on checking out some vim interactive tutorials 1, 2 and also the vim adventures game in a bit to practice.

I've used git for a long time, but almost exclusively for solo projects. Basically, lots of commits on a single branch. This has forced me to get, if not comfortable with, at least exposed to using branches to track my progress on exercises and better understanding forkss, pull requests, origins, and such for working with multiple people.

Resources

A note on repositories

Because this series features a set of exercises on code generated by another code base, there's a few layers of repositories.

The canonical base for the completed exercises is maintained by @lemouth
I maintain my own fork with code to be pulled into the main repo
I keep a 'working directory' with the full generated codebase, separated by branches for each exercise, such as the 1b branch
MadAnalysis5 itself is kept on launchpad, as that's what most people working in that discipline are used to, but @lemouth has mirrored it on Github

Useful C++ stuff

Wikepedia article on changes to C++ since I last used it
I found cppreference to be particularly useful
- Info on modern iteration through vectors
- Lambda functions
- Erase-remove idiom

VIM stuff

Why bother
Vim adventures
Online interactive tutorials 1, 2

Documentation

CERN ATLAS detector reference pdf
MadAnalysis5 Sample Analyzer code documentation
MadAnalysis5 main site

Series Backlinks

As I understand it, this section is mainly for links within my own series contributions. Below that list, I will also include the relevant links to other's posts within this series.

Installing MadAnalysis5 under WSL and Task1a
Further exercises to follow

@lemouth

@irelandscape

@mactro

#blog #steemstem #science #physics

6 years ago in #utopian-io by effofex (62)

$114.69

Sort:

Trending

[-]

irelandscape (62) 6 years ago

Hi @effofex.

Well done in doing the exercise.
I was a bit in the same situation as I haven't done any c++ programming in 20 years...

I think there might be a bug in the code. I'll have a look later on today when I have a chance.

$3.79

[-]

lemouth (72) 6 years ago

I found two of them. You can check my comment below.

$0.00

[-]

utopian-io (71) 6 years ago

Hey @irelandscape
Here's a tip for your valuable feedback! @Utopian-io loves and incentivises informative comments.

Contributing on Utopian
Learn how to contribute on our website.

Want to chat? Join us on Discord https://discord.gg/h52nFrV.

Vote for Utopian Witness!

$0.00

[-]

flyyingkiwi (55) 6 years ago

My programming is too weak to comment on the details here, but I do think it's a win for steemstem that some members ( outside the physics majors) can conduct a LHC particle physics analysis

$0.00

[-]

effofex (62) 6 years ago

Yeah, I'm really digging how it's helping everyone grow their skills outside their comfort zones.

$0.00

[-]

iauns (50) 6 years ago

Qucik tip on using Ctrl-Z in the terminal. Ctrl-Z sends the foreground process to the background. To recover vim in the case you mentioned above, type fg in the terminal and hit enter. If you also want to get rid of the .swp files, add set noswapfileto your .vimrc. Though I'd recommend reading :help swap-file prior.

$0.00

[-]

effofex (62) 6 years ago

That explains the message I got when trying to edit the file again. I've sent stuff the background from the command line before but never thought about being able to do it from within a session. This tip will likely save me a lot of time in the future, thanks!

$0.00

[-]

irelandscape (62) 6 years ago

Hi @effofex,

I had a chance to review your code.
I must say it is very elegant.
I had no idea C++ had support for lambda functions now.
You also reminded me of some functionalities in the STL which I forgot about.

All this makes your code nice and compact.
I will definitely follow suit in the next exercises.

Thanks for that!

$0.00

[-]

effofex (62) 6 years ago

I'm glad it helped. I'm actually pleasantly surprised with how much the language has evolved over the years. I still don't have a personal use case for switching, but it's nice to know that if I ever need to, it won't be quite so painful.

$0.00

[-]

roj (61) 6 years ago

Thank you for your contribution to the great project and open source community.

The format of the post makes it more than just an exercise result. There is a slight touch of a personal blog and manner of a programming article. It is really appreciated in the category.

Your solution with intentional programming also inspired me, I would love to see you publish more blog posts for the upcoming exercises and analysis.

There is only one point I should correct that, the repository to be shown at the start of the post should be the repository of the project. Therefore the one containing exercise files is not a proper repository, instead, you should use the [repository for MadAnalysis 5].(https://github.com/BFuks/madanalysis-utopian).

There is not much to say for such a post, I just had a good time reading. Thank you again!

Your contribution has been evaluated according to Utopian policies and guidelines, as well as a predefined set of questions pertaining to the category.

To view those questions and the relevant answers related to your post, click here.

Need help? Write a ticket on https://support.utopian.io/.
Chat with us on Discord.
[utopian-moderator]

$0.00

[-]

effofex (62) 6 years ago

I'm glad you liked it and I certainly intend to post more. I'm particularly glad that you liked that it was more than just an exercise post - I intentionally tried to make it worthwhile to people reading for different reasons.

$0.00

[-]

utopian-io (71) 6 years ago

Hey @effofex
Thanks for contributing on Utopian.
We’re already looking forward to your next contribution!

Contributing on Utopian
Learn how to contribute on our website or by watching this tutorial on Youtube.

Want to chat? Join us on Discord https://discord.gg/h52nFrV.

Vote for Utopian Witness!

$0.00

[-]

terrylovejoy (62) 6 years ago

It's obvious there was an exceptional amount of work put into this! Well done!

I'd like to have a shot at it myself, but want to read and understand a bit more about particle physics first (and that might have to wait until next month!).

$0.00

[-]

effofex (62) 6 years ago

Thanks for the compliment.

I understand wanted to get a grip on the physics first, but I'd urge you to dive in. So far, the physics has been minimal (all I've had to learn is that both muons and electrons are leptons), the rest can be muddled through. For example, all you need to know about pseudorapidity is that it is a 'thing' with a 'value' and 'stuff' which is above/below certain values gets filtered.

The benefits to starting now are threefold. First, the longer you wait, the bigger a task it will seem to dive into. Second, if you want to learn the particle physics stuff, the best way is to be 'forced' to by actively coding it up. Third, if you get in sync with all of us, we can support each other while being on the same mental page.

$0.04

[-]

momoriso (64) 6 years ago

very intiristing

$0.00

STEEM 0.20

TRX 0.14

JST 0.029

BTC 66603.07

ETH 3335.94

USDT 1.00

SBD 2.70

Particle physics - Exercise 1b (Reconstructed event filtering, remembering C++ and VIM)

Repository

Introduction

Post main();

System analyzed

Baseline identification

Overlap Removal

Implementation

First step (Overlap Removal)

Second attempt (modern C++)

Goal analysis

Identify baseline particles from a set of reconstructed detector events

Overlap removal

Relearning C++

VIM and git goals

Resources

A note on repositories

Useful C++ stuff

VIM stuff

Documentation

Series Backlinks

Coin Marketplace

Second attempt (modern C⁺⁺)

Relearning C⁺⁺