Release History

The latest release is 1.2 (Jun 14, 2008).

Release 1.1 (Dec 7, 2007).
The R implementation of relase 1.0 (Jun 23, 2007) has a bug, which may cause it to always run in the "--r" mode depending on the R version & settings.

Installation

Perl Users

    To use the Perl package, you must first install the Data::Table Perl module. Data::Table module is available from CPAN. To use RSA version 1.2, you need Data::Table version 1.53.

    HGPValue.pm package must be in a path searchable by RSA.pl, it is in the same directory by default. HGPValue carries out fast hypergeometric p-value calculation.

    Usage of RSA.pl

        RSA.pl [options] fileName
          -l: lower_bound, defaults to 0
          -u: upper bound, defaults to 1
          -r: reverse hit picking, the higher the score the better
              if -r flag is off, the lower the score the better
          -f: input file format: PC, UNIX or MAC, UNIX by default
          -o: output file name, STDOUT if not specified
          -h: help, this message
      
          filename: input file must a in CSV format
          the spreadsheet must contain at least three columns:
            Gene_ID: the gene identifier for the well
            Well_ID: the well identifier
            Score: numerical value for hit picking
          You can supply your own column names using the following three options
    
          -g: column name for gene ID, default "Gene_ID"
          -w: column name for well ID, default "Well_ID"
          -s: column name for score used for sorting, default "Score"
            E.g., RSA.pl -g myGene -w myWell
            will instruct RSA.pl to look for "myGene" instead of "Gene_ID",
            "myWell" instead of "Well_ID", but it will still look for a column
            named "Score" (since -s is not used).
    
          Notice:
            1) the order of the these three columns can be arbitrary
            2) wells share the same Gene_ID are consider independent siRNAs for the same gene
            3) wells are ignored, if Gene_ID or Score is not defined
       
    Examples
    RSA.pl -l 0.2 -u 0.8 -f PC -o output.csv input.csv wells with lower scores are considered more active wells <=0.2 are guaranteed hits, wells >0.8 are guaranteed non-hits wells (0.2,0.8] are determined by RSA algorithm input CSV file is a Windows format output results in output.csv file RSA.pl -l 1.2 -u 2.0 -f PC -r -o output.csv input.csv wells with higher scores are considered more active (specified by -r flag) wells >=2.0 are guaranteed hits, wells <1.2 are guaranteed non-hits wells [1.2,2.0) are determined by RSA algorithm input CSV file is a Windows format output results in output.csv file

R Users

    Usage of RSA.R
    --l: lower_bound, defaults to 0 --u: upper bound, defaults to 1 --r: reverse hit picking, the higher the score the better if -r flag is off, the lower the score the better --i: input file name --o: output file name, STDOUT if not specified
    Examples (see above Perl Examples for explanations)
    R CMD BATCH --vanilla --slave --args --l=0.2 --u=0.8 --i=input.csv --o=output.csv RSA.R R CMD BATCH --vanilla --slave --args --l=1.2 --u=2.0 --r --i=input.csv -o=output.csv RSA.R

Input and Output Format

      Gene_ID,Well_ID,Score: columns from input spreadsheet
      LogP: RSA p-value in log10, i.e., -2 means 0.01;
      RSA_Hit: whether the well is a hit, 1 means yes, 0 means no;
      #hitWell: number of hit wells for the gene
      #totalWell: total number of wells for the gene
        if gene A has three wells w1, w2 and w3, and w1 and w2 are hits,
        #totalWell should be 3, #hitWell should be 2, w1 and w2 should have RSA_Hit set as 1
        and w3 should have RSA_Hit set as 0.
      RSA_Rank: ranking column to sort all wells for hit picking
      Cutoff_Rank: ranking column to sort all wells based on Score in the simple activity-based method
    
      Note: a rank value of 999999 means the well is not a hit. We put a large rank number here
      for the convenient of spreadsheet sorting.
    
      Examples A in output.csv:
      -------------------------
      1221200,7_O20,0.0541,-6.810,1,3,3,1,33
      1221200,18_A21,0.0626,-6.810,1,3,3,2,43
      1221200,41_A21,0.0765,-6.810,1,3,3,3,72
    
      Gene ID 1221200 has three wells, 7_O20, 18_A21 and 41_A21. All show good scores.
      Therefore 3 out of 3 wells are hits (#totalWell=3, #hitWell=3, RSA_Hit=1 for all three wells)
      LogP is -6.810. These three wells are ranked as the best three wells by RSA.
      However, they are ranked as the 33th, 43th and 73th well by the traditional cutoff method.
    
      Examples B in output.csv:
      -------------------------
      3620,21_I17,0.0537,-2.344,1,1,2,162,31
      3620,44_I17,0.7335,-2.344,0,1,2,999999,4113
    
      Gene ID 3620 has two wells, 21_I17 is active, while 44_I17 is relative inactive.
      RSA decides that only 1 out of the 2 wells is a hit. Therefore one well has RSA_Hit set as 1,
      and the other 0. #totalWell=2, but #hitWell=1.
      The first well is the 162th hit by RSA, 31th by cutoff method.
      The second well is not a hit by RSA, 4113th by cutoff method.
    
    

Credits

Perl version: Yingyao Zhou, yzhou_at_gnf_dot_org, April 30, 2007
R version: Bin Zhou, bzhou_at_gnf_dot_org, May 3, 2007