RSA Documentation

Release History

The latest release is 1.2 (Jun 14, 2008).

Release 1.1 (Dec 7, 2007).
The R implementation of relase 1.0 (Jun 23, 2007) has a bug, which may cause it to always run in the "--r" mode depending on the R version & settings.

Installation

Perl Users

To use the Perl package, you must first install the Data::Table Perl module. Data::Table module is available from CPAN. To use RSA version 1.2, you need Data::Table version 1.53.

HGPValue.pm package must be in a path searchable by RSA.pl, it is in the same directory by default. HGPValue carries out fast hypergeometric p-value calculation.

Usage of RSA.pl

    RSA.pl [options] fileName
      -l: lower_bound, defaults to 0
      -u: upper bound, defaults to 1
      -r: reverse hit picking, the higher the score the better
          if -r flag is off, the lower the score the better
      -f: input file format: PC, UNIX or MAC, UNIX by default
      -o: output file name, STDOUT if not specified
      -h: help, this message
  
      filename: input file must a in CSV format
      the spreadsheet must contain at least three columns:
        Gene_ID: the gene identifier for the well
        Well_ID: the well identifier
        Score: numerical value for hit picking
      You can supply your own column names using the following three options

      -g: column name for gene ID, default "Gene_ID"
      -w: column name for well ID, default "Well_ID"
      -s: column name for score used for sorting, default "Score"
        E.g., RSA.pl -g myGene -w myWell
        will instruct RSA.pl to look for "myGene" instead of "Gene_ID",
        "myWell" instead of "Well_ID", but it will still look for a column
        named "Score" (since -s is not used).

      Notice:
        1) the order of the these three columns can be arbitrary
        2) wells share the same Gene_ID are consider independent siRNAs for the same gene
        3) wells are ignored, if Gene_ID or Score is not defined
   Examples
      RSA.pl -l 0.2 -u 0.8 -f PC -o output.csv input.csv
        wells with lower scores are considered more active
        wells <=0.2 are guaranteed hits, wells >0.8 are guaranteed non-hits
        wells (0.2,0.8] are determined by RSA algorithm
        input CSV file is a Windows format
        output results in output.csv file
  
      RSA.pl -l 1.2 -u 2.0 -f PC -r -o output.csv input.csv
        wells with higher scores are considered more active (specified by -r flag)
        wells >=2.0 are guaranteed hits, wells <1.2 are guaranteed non-hits
        wells [1.2,2.0) are determined by RSA algorithm
        input CSV file is a Windows format
        output results in output.csv file

R Users


   Usage of RSA.R
    --l: lower_bound, defaults to 0
    --u: upper bound, defaults to 1
    --r: reverse hit picking, the higher the score the better
         if -r flag is off, the lower the score the better
    --i: input file name
    --o: output file name, STDOUT if not specified

   Examples (see above Perl Examples for explanations)
    R CMD BATCH --vanilla --slave --args --l=0.2 --u=0.8 --i=input.csv --o=output.csv RSA.R
    R CMD BATCH --vanilla --slave --args --l=1.2 --u=2.0 --r --i=input.csv -o=output.csv RSA.R

Input and Output Format

Gene_ID,Well_ID,Score: columns from input spreadsheet
LogP: RSA p-value in log10, i.e., -2 means 0.01;
RSA_Hit: whether the well is a hit, 1 means yes, 0 means no;
#hitWell: number of hit wells for the gene
#totalWell: total number of wells for the gene
if gene A has three wells w1, w2 and w3, and w1 and w2 are hits,
#totalWell should be 3, #hitWell should be 2, w1 and w2 should have RSA_Hit set as 1
and w3 should have RSA_Hit set as 0.
RSA_Rank: ranking column to sort all wells for hit picking
Cutoff_Rank: ranking column to sort all wells based on Score in the simple activity-based method

Note: a rank value of 999999 means the well is not a hit. We put a large rank number here
for the convenient of spreadsheet sorting.

Examples A in output.csv:
-------------------------
1221200,7_O20,0.0541,-6.810,1,3,3,1,33
1221200,18_A21,0.0626,-6.810,1,3,3,2,43
1221200,41_A21,0.0765,-6.810,1,3,3,3,72

Gene ID 1221200 has three wells, 7_O20, 18_A21 and 41_A21. All show good scores.
Therefore 3 out of 3 wells are hits (#totalWell=3, #hitWell=3, RSA_Hit=1 for all three wells)
LogP is -6.810. These three wells are ranked as the best three wells by RSA.
However, they are ranked as the 33th, 43th and 73th well by the traditional cutoff method.

Examples B in output.csv:
-------------------------
3620,21_I17,0.0537,-2.344,1,1,2,162,31
3620,44_I17,0.7335,-2.344,0,1,2,999999,4113

Gene ID 3620 has two wells, 21_I17 is active, while 44_I17 is relative inactive.
RSA decides that only 1 out of the 2 wells is a hit. Therefore one well has RSA_Hit set as 1,
and the other 0. #totalWell=2, but #hitWell=1.
The first well is the 162th hit by RSA, 31th by cutoff method.
The second well is not a hit by RSA, 4113th by cutoff method.

Credits

Perl version: Yingyao Zhou, yzhou_at_gnf_dot_org, April 30, 2007
R version: Bin Zhou, bzhou_at_gnf_dot_org, May 3, 2007

Redundant siRNA Activity (RSA) Analysis
Probabilistic Hit Selection Algorithm

[ Home | Download | Documentation | Support ]

Release History

Installation

Perl Users

Usage of RSA.pl

Examples

R Users

Usage of RSA.R

Examples (see above Perl Examples for explanations)

Input and Output Format

Credits

Redundant siRNA Activity (RSA) AnalysisProbabilistic Hit Selection Algorithm

[ Home | Download | Documentation | Support ]

Release History

Installation

Perl Users

Usage of RSA.pl

Examples

R Users

Usage of RSA.R

Examples (see above Perl Examples for explanations)

Input and Output Format

Credits

Redundant siRNA Activity (RSA) Analysis
Probabilistic Hit Selection Algorithm