Child pages
  • Drell-Yan Analysis Procedure 2011
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 22 Next »

Completeness: 70%

Drell-Yan analysis Procedure

This twiki represent a review of all the most important parts of the Drell-Yan cross section measurement. It is intended to familiarize you with the technical aspects of the analysis procedure. At the end of each section there is a set of quesyions and exercises. Note, the software and samples eveolve very frequently so some parts of the page will get obsolete very soon!

step1: PAT/ntuples

  • Samples
  •  MC samples change frequently (usually we have a new generation of MC samples every season). Even though GEN level stays the same reconstruction keeps getting improved from release to release, so it is recommnded to use the most up-to-date MC. Another things that constantly changes is the vertex information (tuned to data) and pile up.
  • MC: 311X Monte Carlo samples (this list shows all the signals and backgrounds we consider, but it is already superseded by Summer11 42X samples)
      • Drell-Yan:
        • /DYToMuMu_M-10To20_TuneZ2_7TeV-pythia6/Spring11-PU_S1_START311_V1G1-v1/GEN-SIM-RECO
        • /DYToMuMu_M-20_CT10_TuneZ2_7TeV-powheg-pythia/Spring11-PU_S1_START311_V1G1-v1/GEN-SIM-RECO
      • QCD
        • /QCD_Pt-15to20_MuPt5Enriched_TuneZ2_7TeV-pythia6/Spring11-PU_S1_START311_V1G1-v1/GEN-SIM-RECO
        • /QCD_Pt-20to30_MuPt5Enriched_TuneZ2_7TeV-pythia6/Spring11-PU_S1_START311_V1G1-v1/GEN-SIM-RECO
        • /QCD_Pt-30to50_MuPt5Enriched_TuneZ2_7TeV-pythia6/Spring11-PU_S1_START311_V1G1-v1/GEN-SIM-RECO
        • /QCD_Pt-50to80_MuPt5Enriched_TuneZ2_7TeV-pythia6/Spring11-PU_S1_START311_V1G1-v1/GEN-SIM-RECO
        • /QCD_Pt-80to120_MuPt5Enriched_TuneZ2_7TeV-pythia6/Spring11-PU_S1_START311_V1G1-v1/GEN-SIM-RECO
        • /QCD_Pt-120to150_MuPt5Enriched_TuneZ2_7TeV-pythia6/Spring11-PU_S1_START311_V1G1-v1/GEN-SIM-RECO
        • /QCD_Pt-150_MuPt5Enriched_TuneZ2_7TeV-pythia6/Spring11-PU_S1_START311_V1G1-v1/GEN-SIM-RECO
      • DY tautau
        • /DYToTauTau_M-20_CT10_TuneZ2_7TeV-powheg-pythia-tauola/Spring11-PU_S1_START311_V1G1-v2/GEN-SIM-RECO
      • Dibosons
        • /WWtoAnything_TuneZ2_7TeV-pythia6-tauola/Spring11-PU_S1_START311_V1G1-v1/GEN-SIM-RECO
        • /WZtoAnything_TuneZ2_7TeV-pythia6-tauola/Spring11-PU_S1_START311_V1G1-v1/GEN-SIM-RECO
        • /ZZtoAnything_TuneZ2_7TeV-pythia6-tauola/Spring11-PU_S1_START311_V1G1-v1/GEN-SIM-RECO
      • Wmunu
        • /WtoMuNu_TuneD6T_7TeV-pythia6/Spring11-PU_S1_START311_V1G1-v1/GEN-SIM-RECO
  • Current latest MC generation is 42X Summer11
  • DATA
    • We use SingleMu and DoubleMu PDs, PromptReco (and ReReco when it becomes available)
    • JSONs are constantly updated in this directory as we certify more and more data:
      • "Golden" */afs/cern.ch/cms/CAF/CMSCOMM/COMM_DQM/certification/Collisions11/7TeV/*Prompt/Cert_160404-161312_7TeV_PromptReco_Collisions11_JSON.txt
      • Calo-free /afs/cern.ch/cms/CAF/CMSCOMM/COMM_DQM/certification/Collisions11/7TeV/Prompt/Cert_160404-161312_7TeV_PromptReco_Collisions11_JSON_MuonPhys.txt
  • Use DBS tool to retrieve info about the most up-to-date MC and DATA samples
  • Relevant software: CMSSW_4_1_3: see the twiki page to get the information about CMSSW releases used for data taking as they evolve
    • Global tag
      • Data: GR_R_311_V2 for reprocessing and GR_P_V14 for prompt reco
      • MC: MC_311_V2, START311_v2
    • ElectroWeakAnalysis/Skimming, ElectroWeakAnalysis/DYskimming, UserCode
  • how to run:  PATtuple production: see the corresponding twiki, note, a lot of code in CVS is outdated (we keep updating it)
    cmsrel CMSSW_4_1_3
    cd CMSSW_4_1_3/src
    cmsenv
    addpkg ElectroWeakAnalysis/Skimming
    cvs co -d ElectroWeakAnalysis/DYskimming -r V00-00-05 UserCode/Purdue/DYAnalysis/Skimming
    scram b

Exercise1:

Q1: Do you think the list of signal and background samples mentioned above is complete for the Drell-Yan cross section measurement in the region 15-600 GeV invariant mass? (You need to think not only about physics processes but also about kinematic region of the sample generated)

Q2: Using the DBS tool mentioned above or a command line equivalent find all the MC samples relevant for the Drell-Yan cross-section measurement for Summer11 production, example:

find file where file.tier=GEN-SIM-RECO and site=srm-dcache.rcac.purdue.edu and dataset like=*DYToMuMu*Summer11*

Q3. (might be advanced, if you fails ask me) Port the above skimming code to 42X: you need this, because 41X and 42X releases are incompatible

Q4: Produce a PAT-tuple

More PAT-tuples can be found here: /store/user/asvyatko/DYstudy/dataAnalysis11//PATtuple/ (you can use signal and Data only for now)

step2: ntuple

Since the event size even for PAT is quite big, we usually perform one more skimming step and store our data in form of ntuples. The input for ntuple-maker is PAT tuple (however, with some gymnastics you can run on GEN-SIM-RECO or AODSIM)

cvs co -d Phys/DimuonAnalyzer UserCode/Purdue/DYAnalysis/NtupleMaker
scram b

Exercise2:

Q1: Try to produce ntuple

Hint: spot all the errors, they might be related to global tag, REDIGI version (for MC)

Q2: Find the invariant mass branch and inspect it

step3: Event Selection

Currently we use the so-called cut-based approach to discriminate between signal and background. For more on event selection please read chapter 3 in the analysis note CMS-AN-11-013.

  • Mass range and binning:
    • for the early stage of 2011 analysis we keep the 2010 binning [15,20,30,40,50,60,76,86,96,106,120,150,200,600]

    • Based on the analysis of trigger thresholds and acceptance/significance curves we might reoptimize the lower mass edge.
    • Higher mass edge will be determined by the highest mass dimuons available in the dataset
    • It is possible that statistics will allow to make bins finer and go to 2D mass-rapidity bins
  • Trigger selection: HLT_Mu15_* || HLT_Mu24_* (by run range), HLT_DoubleMu7_*
  • Offline selection: Baseline event selection has not changed compared to 2010 analysis, see
    • we will consider moving to PF muons and PF isolation: this study is in progress right now
  • Relevant software: ROOT macros for control plots and invariant mass plot are here: UserCode/Purdue/DYAnalysis/AnalysisMacros/ControlPlots, check it out as usually
  • When you have produced the ntuples, you need to set the up:
1. copy the Ntuples in rootfiles directory: rootfiles/HLT_Mu15/*

2. modify the createChain.pl file

cd rootfiles
vi createChain.pl

$dir = "/work/hdyoo/Work/DYanalysis/dataAnalysis10/analysis/rootfiles"; // correct path
$trig = "HLT_Mu15"; // trigger name (directory name)

./Chains.csh

then all chain_* files will be created.

3. go to ControlPlots directory
cd ../ControlPlots

and run the macro in ROOT

> root -l InvMass.C++

Q1: try to run InvMass.C macro to produce the invariant mass plot

Hint: you need to change the trigger path names

step4: Acceptance

  • Acceptance is determined using GEN level information, which has not changed compared to the 2010 analysis
  • Use the acceptance numbers documented in the note: AN-11-013

We use POWHEG NLO acceptance, see all the relevant formulas in this talk (the updated version), see slide18 for acceptance definitions and refer to earlier talks. Di\ue to the intrinsic discrepancies in the modeling between the POWHEG and FEWZ we correct the Z kinematics. For that, we extract the weight maps from POWHEG at NLO and from FEWZ at NNLO (to be more specific it is at NNLO below 40 GeV and NLO otherwise, as the effect of higher order corrections is negligible at the higher masses). The weight map is essentially the ratio of double inclusive cross-sections extracted from POWHEG and FEWZ per PT-Y bin (PT, Y refer to Z kinematics, which is identical here to dimuon kinematics, our final aim). Details on the re-weighting technique are also in the linked presentation.

How to run: The code makes use of mostly Adam's tools. Current working area is here: /home/ba01/u115/asvyatko/PATAdam/CMSSW_3_8_7/src

To setup area do:

cmsrel CMSSW_3_8_7
cd CMSSW_3_8_7/src
cmsenv
cvs co -r V00-09-11 SUSYBSMAnalysis/Zprime2muAnalysis
SUSYBSMAnalysis/Zprime2muAnalysis/setup.csh # This last command checks
                                            # out other packages and
                                            # executes scramv1 b to
                                            # compile everything.
cvs co -d Acceptance /UserCode/ASvyatkovskiy/AcceptanceAdam

Not yet tested under CMSSW 4XY, because there was no need - GEN level info did not change. Follow README to calculate weights, which also calculates the 1D acceptance in mass bins.

Weight map producer uses the FEWZ acceptance as the input, find all the macros here: /UserCode/ASvyatkovskiy/FEWZ_scripts,

to work one needs to do*:*

cd $RCAC_SCRATCH/
cp -r /home/ba01/u115/asvyatko/TEST_FEWZ/FEWZ_2.0.1.a3/ YOUR_FEWZ_WORKING_DIR
cvs co -d TMP /UserCode/ASvyatkovskiy/FEWZ_scripts
cp TMP/* .
rm -r TMP

To run you use whatever config you need python nu_create_*.py.

step5: Efficiencies

The details on the factorization and the application of correction factors are documented here , and can be found in this talk

  • Trigger, Reconstruction: TagAndProbe, official package
  • How to run (on top of CMSSW 414):
    addpkg MuonAnalysis/MuonAssociators V01-14-00
    addpkg PhysicsTools/PatUtils
    addpkg PhysicsTools/TagAndProbe V04-00-06
    cvs co -A MuonAnalysis/TagAndProbe
    
  • The procedure goes in two steps: 
  • T&P tree production -> rerun seldom (ideally once), it depends only on the definitions of the tag and probe 
  • fitting: separate job for trigger and all the muonID related efficiencies -> reran frequently and usually interactively (change binning, definitions)
  • All the latest macros/configs can be found here: UserCode/ASvyatkovskiy/TagAndProbe
  • Isolation: RandomCone - currently, code is not available for public use.

step6: Background estimation

There are various methods employed to estimate the QCD background in a data-driven way (QCD is currently the only background estimated not from MC). The most important are:

  • The template method
    • Isolation variable has distinct shape for signal and background
    • Create templates for shapes using data
    • Same sign events for background
    • Signal from opposite sign events in the Z peak region with one muon highly isolated (shape taken from second muon)
    • Extract estimate of background contamination in signal region using the fit results
  • Weight map method
    • Produce  pT-Y weight map

A lot of intersting information can be retrieved from the Zprime JTERM SHORT and LONG exercises (which are constructed along the same lines as this tutorial).

And you can always ask me: asvyatko@purdue.edu

  • No labels