Child pages
  • Drell-Yan Analysis Procedure 2011
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 33 Next »

Completeness: 70%

Drell-Yan analysis Procedure

This twiki represents a review of all the most important parts of the Drell-Yan cross section measurement. It is intended to familiarize you with the technical aspects of the analysis procedure. At the end of each section there is a set of questions and exercises. Note, the software and samples evolve very frequently so some parts of the page will get obsolete very soon!

step1: PAT/ntuples

  • Samples
  •  MC samples change frequently (usually we have a new generation of MC samples every season). Even though GEN level stays the same reconstruction keeps getting improved from release to release, so it is recommnded to use the most up-to-date MC. Another things that constantly changes is the vertex information (tuned to data) and pile up.
  • MC: 311X Monte Carlo samples (this list shows all the signals and backgrounds we consider, but it is already superseded by Summer11 42X samples)
  • Drell-Yan:
    • /DYToMuMu_M-10To20_TuneZ2_7TeV-pythia6/Spring11-PU_S1_START311_V1G1-v1/GEN-SIM-RECO
    • /DYToMuMu_M-20_CT10_TuneZ2_7TeV-powheg-pythia/Spring11-PU_S1_START311_V1G1-v1/GEN-SIM-RECO
  • QCD
    • /QCD_Pt-15to20_MuPt5Enriched_TuneZ2_7TeV-pythia6/Spring11-PU_S1_START311_V1G1-v1/GEN-SIM-RECO
    • /QCD_Pt-20to30_MuPt5Enriched_TuneZ2_7TeV-pythia6/Spring11-PU_S1_START311_V1G1-v1/GEN-SIM-RECO
    • /QCD_Pt-30to50_MuPt5Enriched_TuneZ2_7TeV-pythia6/Spring11-PU_S1_START311_V1G1-v1/GEN-SIM-RECO
    • /QCD_Pt-50to80_MuPt5Enriched_TuneZ2_7TeV-pythia6/Spring11-PU_S1_START311_V1G1-v1/GEN-SIM-RECO
    • /QCD_Pt-80to120_MuPt5Enriched_TuneZ2_7TeV-pythia6/Spring11-PU_S1_START311_V1G1-v1/GEN-SIM-RECO
    • /QCD_Pt-120to150_MuPt5Enriched_TuneZ2_7TeV-pythia6/Spring11-PU_S1_START311_V1G1-v1/GEN-SIM-RECO
    • /QCD_Pt-150_MuPt5Enriched_TuneZ2_7TeV-pythia6/Spring11-PU_S1_START311_V1G1-v1/GEN-SIM-RECO
  • DY tautau
    • /DYToTauTau_M-20_CT10_TuneZ2_7TeV-powheg-pythia-tauola/Spring11-PU_S1_START311_V1G1-v2/GEN-SIM-RECO
  • Dibosons
    • /WWtoAnything_TuneZ2_7TeV-pythia6-tauola/Spring11-PU_S1_START311_V1G1-v1/GEN-SIM-RECO
    • /WZtoAnything_TuneZ2_7TeV-pythia6-tauola/Spring11-PU_S1_START311_V1G1-v1/GEN-SIM-RECO
    • /ZZtoAnything_TuneZ2_7TeV-pythia6-tauola/Spring11-PU_S1_START311_V1G1-v1/GEN-SIM-RECO
  • Wmunu
    • /WtoMuNu_TuneD6T_7TeV-pythia6/Spring11-PU_S1_START311_V1G1-v1/GEN-SIM-RECO

Current latest MC generation is 42X Summer11

  • DATA:
  • We use SingleMu and DoubleMu PDs, PromptReco (and ReReco when it becomes available)
  • JSONs are constantly updated in this directory as we certify more and more data:
    • "Golden" */afs/cern.ch/cms/CAF/CMSCOMM/COMM_DQM/certification/Collisions11/7TeV/*Prompt/Cert_160404-161312_7TeV_PromptReco_Collisions11_JSON.txt
    • Calo-free /afs/cern.ch/cms/CAF/CMSCOMM/COMM_DQM/certification/Collisions11/7TeV/Prompt/Cert_160404-161312_7TeV_PromptReco_Collisions11_JSON_MuonPhys.txt
  • Use DBS tool to retrieve info about the most up-to-date MC and DATA samples
  • Relevant software: CMSSW_4_1_3: see the twiki page to get the information about CMSSW releases used for data taking as they evolve
    • Global tag
      • Data: GR_R_311_V2 for reprocessing and GR_P_V14 for prompt reco
      • MC: MC_311_V2, START311_v2
    • ElectroWeakAnalysis/Skimming, ElectroWeakAnalysis/DYskimming, UserCode
  • how to run:  PATtuple production: see the corresponding twiki, note, a lot of code in CVS is outdated (we keep updating it)
cmsrel CMSSW_4_1_3
cd CMSSW_4_1_3/src
cmsenv
addpkg ElectroWeakAnalysis/Skimming
cvs co -d ElectroWeakAnalysis/DYskimming -r V00-00-05 UserCode/Purdue/DYAnalysis/Skimming
scram b
cmsRun ElectroWeakAnalysis/DYskimming/PATTuple/test_cfg.py

exercise1:

Q1: Do you think the list of signal and background samples mentioned above is complete for the Drell-Yan cross section measurement in the region 15-600 GeV invariant mass? (You need to think not only about physics processes but also about kinematic region of the sample generated)

Q2: Using the DBS tool mentioned above or a command line equivalent find all the MC samples relevant for the Drell-Yan cross-section measurement for Summer11 production, example:

find file where file.tier=GEN-SIM-RECO and site=srm-dcache.rcac.purdue.edu and dataset like=*DYToMuMu*Summer11*

Q3. (might be advanced, if you fails ask me) Port the above skimming code to 42X: you need this, because 41X and 42X releases are incompatible

Q4: Produce a PAT-tuple

More PAT-tuples can be found here: /store/user/asvyatko/DYstudy/dataAnalysis11//PATtuple/ (you can use signal and Data only for now)

step2: ntuple

Since the event size even for PAT is quite big, we usually perform one more skimming step and store our data in form of ntuples. The input for ntuple-maker is PAT tuple (however, with some gymnastics you can run on GEN-SIM-RECO or AODSIM)

cvs co -d Phys/DimuonAnalyzer UserCode/Purdue/DYAnalysis/NtupleMaker
scram b
cd Phys/DimuonAnalyzer
cmsRun test/test_cfg.py

Exercise2:

Q1: Try to produce ntuple

Hint: spot all the errors, they might be related to global tag, REDIGI version (for MC)

Q2: Find the invariant mass branch and inspect it

step3: Event Selection

Once the ntuples are ready, one can proceed to the actual physics analysis. The first step of every analysis is the event selection. Currently, we use the so-called cut-based approach to discriminate between signal and background. For more on event selection please read chapter 3 in the analysis note CMS-AN-11-013. Before starting to run a macro, set up the working area:

1. copy the Ntuples in rootfiles directory, splitting by trigger path used at the level of skimming (as we might use a combination!): e.g. rootfiles/HLT_Mu15/

2. modify the createChain.pl file

cd rootfiles
vim createChain.pl

change this to the full path to your rootfiles directory:
$dir = "/work/asvyatko/Work/DYanalysis/dataAnalysis10/analysis/rootfiles";
choose the trigger scenario you wish to use:
$trig = "HLT_Mu15"; // trigger name (directory name)

Run the shell script:
./Chains.csh

then all chain_* files will be created in the roorfiles directory.

Before running the macros, we need to fix few things which are changing frequently for our analysis:

  • Mass range and binning:
    • for the early stage of 2011 analysis we keep the 2010 binning [15,20,30,40,50,60,76,86,96,106,120,150,200,600]

  • Trigger selection:
    • See the presentation on event selection for 2011
    • Thus, for 2011 we consider a combination of Double muon trigger and a combination of single isolated muon triggers can be used as a cross-check. Use three following combinations:
      • HLT_Mu15, HLT_Mu24, HLT_Mu30
      • HLT_IsoMu15, HLT_IsoMu17, HLT_IsoMu24
      • DoubleMu6, DoubleMu7, Mu13_Mu8
  • Offline selection: Baseline event selection has not changed compared to 2010 analysis, see
    • we will consider moving to PF muons and PF isolation: this study is in progress right now
  • Relevant software: ROOT macros for control plots and invariant mass plot are here: UserCode/Purdue/DYAnalysis/AnalysisMacros/ControlPlots, check it out as usually
  • When you have produced the ntuples, you need to set the up:
cd ..
cvs co -d ControlPlots UserCode/Purdue/DYAnalysis/NtupleMaker
cd ControlPlots

To produce the invariant mass plot do:

root -l InvMass.C++

To produce dimuon kinematic distributions run

root -l TransMomentum.C++
root -l Rapidity.C++

To produce other control plot (for all the event selection variables used in the analysis, as documented in the note), use:

 root -l ControlPlots.C++

There are few macros that help us to optimize the cuts. These macros calculate the statistical significance and the uncertainty on the cross-section. Statistical significance is defined as :

S = N_sig/sqrt(N_sig+N_bkg) and normally determined from MC. As you can infer, it scales with luminosity as ~sqrt(Lumi). There are other definitions of significance used in the analyses sometimes (see for instance CMS-TDR).

cvs co UserCode/Purdue/DYAnalysis/NtupleMacro/AsymmetricCutStudy
root -l OfflineSelection.C++
root -l graphStats.C++

The first macro will create a txt file with an per mass bin values of signal and background. The second macro will histogram the output. These macros are adjusted to optimize the acceptance cuts, but with minimal changes it can optimize any other cut we use, and it is possible to change style to conform with the rest of plots in the note.

Q1: Check data/MC agreement for each plot, look for discrepancies.

Checkpoint1 With the macros described above you should be able to *reproduce* following plots from the CMS-AN-11-013: 1,3-14, 17-29,51.

Note: for the 23,25-29 macros have different style and were produce with PU sample.

Note: plots 20-22 are reproducible by optimization macros but have different style.

step4: Acceptance

Another constituent of the cross-section measurement is the acceptance.

  • Acceptance is determined using GEN level information, so that the generation of MC is not crucial
  • Use the acceptance numbers documented in the note to save time: CMS-AN-11-013

Below are some additional details on the acceptance calculation which might be necessary if the procedure will change.

We use POWHEG NLO acceptance, see all the relevant formulas in this talk (the updated version), see slide18 for acceptance definitions and refer to earlier talks. Di\ue to the intrinsic discrepancies in the modeling between the POWHEG and FEWZ we correct the Z kinematics. For that, we extract the weight maps from POWHEG at NLO and from FEWZ at NNLO (to be more specific it is at NNLO below 40 GeV and NLO otherwise, as the effect of higher order corrections is negligible at the higher masses). The weight map is essentially the ratio of double inclusive cross-sections extracted from POWHEG and FEWZ per PT-Y bin (PT, Y refer to Z kinematics, which is identical here to dimuon kinematics, our final aim). Details on the re-weighting technique are also in the linked presentation.

How to run: The code makes use of mostly Adam's tools. Current working area is here: /home/ba01/u115/asvyatko/PATAdam/CMSSW_3_8_7/src

To setup area do:

cmsrel CMSSW_3_8_7
cd CMSSW_3_8_7/src
cmsenv
cvs co -r V00-09-11 SUSYBSMAnalysis/Zprime2muAnalysis
SUSYBSMAnalysis/Zprime2muAnalysis/setup.csh # This last command checks
                                            # out other packages and
                                            # executes scramv1 b to
                                            # compile everything.
cvs co -d Acceptance /UserCode/ASvyatkovskiy/AcceptanceAdam

To calculate the acceptance as a function of invariant mass:

 python AcceptanceFromPAT_a2_mapcfg.py

one needs to configure pT and eta cuts foe each muon from with the script. Next, produce histograms for invariant mass and Z/lepton kinematics you will use to calculate the acceptance:

 root -l macroAcceptance.C

To draw the plots execute:

root -l macroDrawHwidongAcc.C

Weight map producer uses the FEWZ acceptance as the input, find all the macros here: /UserCode/ASvyatkovskiy/FEWZ_scripts,

to work one needs to do*:*

cd $RCAC_SCRATCH/
cp -r /home/ba01/u115/asvyatko/TEST_FEWZ/FEWZ_2.0.1.a3/ YOUR_FEWZ_WORKING_DIR
cvs co -d TMP /UserCode/ASvyatkovskiy/FEWZ_scripts
cp TMP/* .
rm -r TMP

To run you use whatever config you need

python nu_create_*.py.

to get the output of the jobs use corresponding get-output scripts:

python nu_get_*.py

After the maps are ready and saved in the root file, you can proceed with corrected acceptance calculation. Before that, review the maps to check if there is nothing strange:

root -l Map2D.C

If everything looks reasonable (no very large weights, no pronounced asymmetries, continue with corrected acceptance:

python countxsec.py DYM1020_single_tuple.root DYM20_single_tuple.root
python makeweights.py powheg_xs_full.root fewz_xs_full.root
python countcorracc.py --sg=DYM1020_single_tuple.root,DYM20_single_tuple.root -r fewz_powheg_weights.root

Each next step uses the output of the previous step. Firstly we calculate the PT-Y dimuon kinematics map, then we calculate weight for FEWZ-POWHEG reweighting and finally we calculate the corrected acceptance. Note: the k factors are equal to 1 by default but should be readjusted in case one uses NLO FEWZ maps (rather than NNLO)

Finally, there are various plots used to estimate the FEWZ-POWHEG discrepancy for various trigger scenarios as well as NNLO-NLO discrepancy for within a given generator and a given PDF set. These plots can be reproduced using the following macros:

 root -l Acceptance.C

inside this macro, you can uncomment what you need to plot and also adjust the input PDf sets as well as orders.

Checkpoint2 With the macros and scripts described in the step4 section you should be able to *reproduce* following macros from the CMS-AN-11-013: 2,30-38,61-63 and tables: 5-10, 21-24

step5: Efficiencies

The details on the factorization and the application of correction factors are documented here , and can be found in this talk. With the current factorization scheme we measure four following efficiencies:

  • Trigger, Reconstruction+ID, isoloation, tracking:
    • We use the officiela TagAndProbe package
  • How to run (on top of CMSSW 414):
    addpkg MuonAnalysis/MuonAssociators V01-14-00
    addpkg PhysicsTools/PatUtils
    addpkg PhysicsTools/TagAndProbe V04-00-06
    cvs co -A MuonAnalysis/TagAndProbe
    
  • The procedure goes in two steps: 
  • T&P tree production -> rerun seldom (ideally once), it depends only on the definitions of the tag and probe
cd TagAndProbe
cmsRun tp_from_aod.py
  • fitting: separate job for trigger and all the muonID related efficiencies -> reran frequently and usually interactively (change binning, definitions)
cmsRun fitMuonID.py
  • All the latest macros/configs can be found here: UserCode/ASvyatkovskiy/TagAndProbe
  • Isolation: RandomCone - currently, code is private and not possible to use.

After familiarizing yourself with the TagAndProbe package, you need to produce the muon efficiencies as a function of pT and eta. You do not need this in the analysis, but rather to understand if everything you are doing is correct. After you are done with that, produce the 2D efficiency pT-eta map (it is alredy produced in one go when running fiMuonID.py). To do that use the simple root macros:

root -l idDataMC_4xy.C
root -l triggerMuonDataMC_4xy.C

And to produce 2D efficiency maps and correction factors do:

 root -l perbinTable.C

The final step here is to produce the efficiency as function of invariant mass and the efficiency correction factor as a function of invariant mass.

cvs co UserCode/Purdue/DYAnalysis/AnalysisMacros/Correction
root -l effCorrProd.C++

Checkpoint3 With the macros describe in the step5 section it is possible to reproduce the following plots from the CMS-AN-11-013 note: 15-16, 39-42

Note: plot 40 was produced with LKTC method, code for which is currently not public and not possible to be retrieved from the authors. Currently (2011 data) the result is consistent with that obtained with Tag-And-Probe.

step6: Background estimation

There are various methods employed to estimate the QCD background in a data-driven way (QCD is currently the only background estimated not from MC). The most important are the template fit method and the weight map method: carefully read chapter6 of the CMS-AN-11-013 for more details on the methods.

Reweighting method. First of all, read the quick description (in addition to what is written in the note).

 cvs co -d QCDEstimation/WeightMap UserCode/Purdue/DYAnalysis/AnalysisMacros/QCDEstimation

There are few steps in this method.

step7: Unfolding

Unfolding is applied to correct for migration of entries between bins caused by mass resolution effects (FSR correction is taken into account as a separate step).  For use in the Drell-Yan analysis, the choice for unfolding is matrix inversion. Provides a common interface between channels for symmetry and ease in combination and systematic studies.

To do any unfolding with MC, this requires 3 things:

  • response matrix
  • histogram of measured events
  • true histogram (clearly not used/available when unfolding data)

Script that demonstrates how the the unfolding/fore-folding object works.

root -l test/testExpo.C++

Demonstrates how to get back the pulls.

  • extra requirement: TH2 for true/measured event generation
root -l test/testPulls.C++

MORE:

/afs/cern.ch/user/h/hdyoo/public/DYstudy/macros/Unfold

1. unfoldingObs.C
to produce unfolding table, 2D response matrix

2. yield.C
to produce yield plot

Checkpoint7 with this macros one should be able tyo reproduce the plot 49 from the note and Tables 17-18

step8: FSR correction

The effect of FSR is manifested by photon emission off the final state muon. It leads to change of the dimuon invariant mass and as a result a dimuon has invariant mass distinct from the propagator (or Z/gamma*) mass.

For our analysis we estimate the effect of FSR and the corresponding correction by estimating the bin-by-bin correction in invariant mass bins. Which is done by comparing the pre-FSR and the post-FSR spectra. The pre-FSR spectrum can be obtained by requiring mother of muon to be Z/gamma*, post FSR spectrum is when the mother is whatever.. The corresponding plots in the note are: 52-55 they all can be calculated with the information avaialble in the ntuple using

root -l InvMass.C++

step9: Systematic uncertainty estimation

step10: Plotting the results

The main result of the measurement is the cross-section ratio or R shape. We distinguish R and r shapes (see the note chapter9 for details on the definition and also see Figures 64). The figure 64 shows the shape R for theory and measurement (for two independent trigger scenarios). It relies on the theoretical cross-section measurement (1-2GeV bin), the final numbers for acceptance correction and also the final numbers for cross-section measurement. To give a clearer feeling of what this plot depends on I name the tables that are used to produce the number in the plot 64:

Table 21-24: Theoretical predictions

Tables 25-26: Measurement

Table 5-10: Acceptance-efficiency corrections

To run the code one simply needs:

 root -l theory_plot.C++

A lot of intersting information can be retrieved from the Zprime JTERM SHORT and LONG exercises (which are constructed along the same lines as this tutorial).

  • No labels