Release History
The latest release is 1.2 (Jun 14, 2008).
Release 1.1 (Dec 7, 2007). The R implementation of relase 1.0 (Jun 23, 2007) has a bug, which may cause it to always run in the "--r" mode depending on the R version & settings.Installation
Perl Users
To use the Perl package, you must first install the Data::Table Perl module. Data::Table module is available from CPAN. To use RSA version 1.2, you need Data::Table version 1.53.
HGPValue.pm package must be in a path searchable by RSA.pl, it is in the same directory by default. HGPValue carries out fast hypergeometric p-value calculation.
Usage of RSA.pl
RSA.pl [options] fileName -l: lower_bound, defaults to 0 -u: upper bound, defaults to 1 -r: reverse hit picking, the higher the score the better if -r flag is off, the lower the score the better -f: input file format: PC, UNIX or MAC, UNIX by default -o: output file name, STDOUT if not specified -h: help, this message filename: input file must a in CSV format the spreadsheet must contain at least three columns: Gene_ID: the gene identifier for the well Well_ID: the well identifier Score: numerical value for hit picking You can supply your own column names using the following three options -g: column name for gene ID, default "Gene_ID" -w: column name for well ID, default "Well_ID" -s: column name for score used for sorting, default "Score" E.g., RSA.pl -g myGene -w myWell will instruct RSA.pl to look for "myGene" instead of "Gene_ID", "myWell" instead of "Well_ID", but it will still look for a column named "Score" (since -s is not used). Notice: 1) the order of the these three columns can be arbitrary 2) wells share the same Gene_ID are consider independent siRNAs for the same gene 3) wells are ignored, if Gene_ID or Score is not definedExamples
RSA.pl -l 0.2 -u 0.8 -f PC -o output.csv input.csv wells with lower scores are considered more active wells <=0.2 are guaranteed hits, wells >0.8 are guaranteed non-hits wells (0.2,0.8] are determined by RSA algorithm input CSV file is a Windows format output results in output.csv file RSA.pl -l 1.2 -u 2.0 -f PC -r -o output.csv input.csv wells with higher scores are considered more active (specified by -r flag) wells >=2.0 are guaranteed hits, wells <1.2 are guaranteed non-hits wells [1.2,2.0) are determined by RSA algorithm input CSV file is a Windows format output results in output.csv file
R Users
Usage of RSA.R
--l: lower_bound, defaults to 0 --u: upper bound, defaults to 1 --r: reverse hit picking, the higher the score the better if -r flag is off, the lower the score the better --i: input file name --o: output file name, STDOUT if not specifiedExamples (see above Perl Examples for explanations)
R CMD BATCH --vanilla --slave --args --l=0.2 --u=0.8 --i=input.csv --o=output.csv RSA.R R CMD BATCH --vanilla --slave --args --l=1.2 --u=2.0 --r --i=input.csv -o=output.csv RSA.RInput and Output Format
Gene_ID,Well_ID,Score: columns from input spreadsheet
LogP: RSA p-value in log10, i.e., -2 means 0.01;
RSA_Hit: whether the well is a hit, 1 means yes, 0 means no;
#hitWell: number of hit wells for the gene
#totalWell: total number of wells for the gene
if gene A has three wells w1, w2 and w3, and w1 and w2 are hits,
#totalWell should be 3, #hitWell should be 2, w1 and w2 should have RSA_Hit set as 1
and w3 should have RSA_Hit set as 0.
RSA_Rank: ranking column to sort all wells for hit picking
Cutoff_Rank: ranking column to sort all wells based on Score in the simple activity-based method
Note: a rank value of 999999 means the well is not a hit. We put a large rank number here
for the convenient of spreadsheet sorting.
Examples A in output.csv:
-------------------------
1221200,7_O20,0.0541,-6.810,1,3,3,1,33
1221200,18_A21,0.0626,-6.810,1,3,3,2,43
1221200,41_A21,0.0765,-6.810,1,3,3,3,72
Gene ID 1221200 has three wells, 7_O20, 18_A21 and 41_A21. All show good scores.
Therefore 3 out of 3 wells are hits (#totalWell=3, #hitWell=3, RSA_Hit=1 for all three wells)
LogP is -6.810. These three wells are ranked as the best three wells by RSA.
However, they are ranked as the 33th, 43th and 73th well by the traditional cutoff method.
Examples B in output.csv:
-------------------------
3620,21_I17,0.0537,-2.344,1,1,2,162,31
3620,44_I17,0.7335,-2.344,0,1,2,999999,4113
Gene ID 3620 has two wells, 21_I17 is active, while 44_I17 is relative inactive.
RSA decides that only 1 out of the 2 wells is a hit. Therefore one well has RSA_Hit set as 1,
and the other 0. #totalWell=2, but #hitWell=1.
The first well is the 162th hit by RSA, 31th by cutoff method.
The second well is not a hit by RSA, 4113th by cutoff method.
Credits
Perl version: Yingyao Zhou, yzhou_at_gnf_dot_org, April 30, 2007
R version: Bin Zhou, bzhou_at_gnf_dot_org, May 3, 2007