Wednesday, 13 December 2017

SDTM Mapping Scenarios to Master

SDTM Mapping Scenarios to Master
                 One of the most puzzling programming problems involves SDTM mapping: mapping datasets from a non-CDISC structure to CDISC SDTM structure.

                  If CDISC standards haven’t been implemented from the beginning of the data collection process (an approach that by now should be considered best practice, but isn’t always possible), there’s going to be some rejigging involved when the time comes to map the data to SDTM.

                 We’re assuming that if you’re mapping and converting data to SDTM, you have a basic knowledge of SDTM and how it works. But if that’s not the case, the SDTM Implementation Guide (SDTMIG) is an essential resource for anyone working in mapping, or programming SDTM datasets. It is a detailed, high-level overview of the specifications and metadata for all SDTM domains, and it includes guidance for producing SDTM domain files. Make yourself familiar with the SDTMIG before you start - it’ll make the process considerably smoother.
A typical mapping scenario has five basic steps to work through:
  1. Identify the datasets you’re planning to map. (This one is easy)
  2. Identify the SDTM datasets that correspond to those datasets.
  3. Gather the metadata of the datasets, and the corresponding SDTM metadata.
  4. Map variables in the datasets from step one to their SDTM domain variables.
  5. Create custom domains for other datasets that don’t have corresponding SDTM datasets.
Arun Raj Vidhyadharan and Sunil Mohan Jairath’s 2014 PharmaSUG paper identifies nine scenarios that can crop up in the actual mapping process. Master these, and SDTM mapping will become that little bit less problematic.

The direct carry forward

Variables that are SDTM-compliant can be directly carried forward to the SDTM dataset and don’t need to be modified.

The variable rename

Some variables need to be renamed in order to map to the corresponding SDTM variable. An example: if the original variable is GENDER, it should be renamed SEX as per SDTM guidelines.

The variable attribute change

As well as variable names, variable attributes have to be mapped. Attributes such as label, type, length and format must comply with the SDTM attributes.

The reformat

The value that is represented does not change, but the format in which it is stored does. Example: converting a SAS date to an ISO8601 format character string.

The combine

In some cases, multiple variables must be combined to form a single SDTM variable.

The split

Conversely, a non-SDTM variable might need to be split into two or more SDTM-compliant variables

The derivation

Some SDTM variables are obtained by deriving a conclusion from data in the non-SDTM dataset. For example, using date of birth and study start date to derive a patient’s age, as opposed to manually entering the age upfront.

The variable value map and new codelist application

Some variable values need to be recoded or mapped to match with the values of a corresponding SDTM variable. This mapping is recommended for variables with a codelist attached that has non-extensible controlled terminology. It’s also advised to map for all values in the controlled terminology than just for the values present in the dataset. This would cover for values that are not in the dataset currently but may come in during future dataset updates.

The horizontal-to-vertical data structure transpose

In situations where the structure of the non-CDISC dataset is completely different to its corresponding SDTM dataset, it might be necessary to transform its structure to one that is SDTM-compliant. The Vital Signs dataset is a prime example: when data is collected in wide form, every test and recorded value is stored in separate variables. As SDTM requires data to be stored in lean form, the dataset must be transported to have the tests, values and unit under three variables. If there are variables that cannot be mapped to an SDTM variable, they would go into supplemental qualifiers.

                                                                                                          Source
Read More »

Friday, 21 April 2017

Constructing Iterative Do Loops

To build an iterative DO loop, you need to begin with a DO command, then incorporate some action command, and then end with an END command. Here's what a simple iterative DO loop should look like:

DO index-variable = start TO stop BY increment;
       
action statements;

END;

Here are brief explanation about each commands
  • DO, index-variable, start, TO, stop, and END are required in every iterative DO loop.
  • Index-variable, which stores the value of the present iteration of the DO loop, can be any valid SAS variable name. It is common, however, to use a single letter, with i and j being the most used.
  • Start is the value of the index variable which tells SAS to start the loop.
  • Stop is the value of the index variable which tells SAS to stop the loop.
  • Increment is by how much you want SAS to alter the index variable after each iteration. The most commonly used increment is 1. In fact, if you don't specify a BY clause, SAS uses the default increment of 1.

For example let us see this DO statement:

         do name= 1 to 5;

The above statement SAS to generate an index variable called name, start at 1, increment by 1, and end at 5, so that the values of name from iteration to iteration are 1, 2, 3, 4, and 5. 

          do var= 2 to 12 by 2;

The above statement tells SAS to generate an index variable called var, start at 2, increment by 2, and end at 12, so that the values of var from iteration to iteration are 2, 4, 6, 8, 10, and 12.

The following program uses a DO loop to tell SAS to determine what four times three (4 × 3) equals:

Constructing iterative loops in SAS Base


The key to understanding the DATA step here is to recall that multiplication is just repeated addition. That is, four times three (4 × 3) is the same as adding three for four times. That's all that the iterative DO loop in the DATA step is telling SAS to do. After having initialized answer to 0, add 3 to answer, then add 3 to answer again, and add 3 to answer again, and add 3 to answer again. After SAS has added 3 to the answer variable four times, SAS exits the DO loop, and since that's the end of the DATA step, SAS moves onto the next process and gives the result.
Read More »

Wednesday, 12 April 2017

SAS : Arrays

SAS Arrays : Introduction

It provides a simple, appropriate way to process a group of variables in a SAS DATA step.

Syntax
Array array-name {number-of-elements} list-of-variables;
Note: You can use [ ] or { } or ( ) for defining number of elements in the ARRAY statement.

Examples

1.  ARRAY ABC[5] a b c d e; 

In the example above, ABC is an array-name, 5 implies the number of variables in array and "a b c d e" are the fields that make up the array.

2. ARRAY ABC[*] a b c d e; 

In the example above, SAS would automatically calculate the number of variables in array.

3. ARRAY ABC[*] X1-X10;

Where the X1 variable contains the X1 value, X2 contains the X2 value, etc.

4. ARRAY ABC[*]  $ X1-X10;

If the variables are of character type then use $ sign before specifying list of variables.

Sample Data

SAS Array : Example
data temp;
input x1 x2 x3 x4$ x5$;
cards;
1 2 3 AA BB
2 3 4 AB CC
3 4 5 AC DD
4 5 6 AD EE
5 6 7 AE FF
6 7 8 AF GG
;
run;
Example I : Numeric variables having value greater than 3 need to be replaced with missing value

data test;
set temp;
array nvars {3} x1-x3;
do i = 1 to 3;
if nvars{i} > 3 then nvars{i} =.;
end;
run;  
Output : Array Statement
Why i is 4 in the output data set?

The first time the loop processes, the value of count is 1; the second time, 2; and the third time, 3. At the beginning of the fourth iteration, the value of count is 4, which is found to be greater than the stop value of 3 so the loop stops. However, the value of i is now 4 and not 3, the last value before it would be greater than 3 as the stop value.
Note : We can drop variable "i" with drop statement or drop data set option.
Improvised version of the above code 
data test;
set temp;
array nvars (*) _numeric_;
do i = 1 to dim(nvars);
if nvars{i} > 3 then nvars{i} =.;
end;
drop i;
run;
Notes - 
  1. The "_numeric_" is used to specify all the numeric variables.
  2. The DIM function returns the number of elements (variables).

Example II. : Extract first letter of all the character variables
data test;
set temp;
array cvars (*) _character_;
do i = 1 to dim(cvars);
cvars{i} = substr(cvars{i},1,1);
end;
drop i;
run;
 Note - The "_character_" is used to specify all the character variables.

Example III. : Extract first letter and fill in the new character variables
data test;
set temp;
array cvars (*) _character_;
array dvars (*) $ x6 X7;
do i = 1 to dim(cvars);
dvars{i} = substr(cvars{i},1,1) ;
end;
drop i;
run;
Example IV : Assign Initial Values in a SAS Array
data abcd;
set temp;
array nvars (*) _numeric_;
array pvars (*) px1 px2 px3;
array pctinc {3} _temporary_ (1.1 , 1.2 ,1.3); do i = 1 to dim(nvars);
pvars{i} = nvars{i} * pctinc{i};
end;
drop i;
run;  
Notes -
  1. In the above example, we are multiplying variables' values with different numbers.
  2. When the key word _TEMPORARY_ is used in a ARRAY statement, data elements are created but are not stored in the data file.

Example V : Calculate Percentage Growth
data abcd;
set temp;
array nvars(*) _numeric_;
array diff{2} _temporary_;
array percent{2};
do i = 1 to 2;
diff{i} = nvars{i +1} - nvars{i};
percent{i} = diff{i} / nvars{i} ;
end;
drop i;
run;

Using the OF Operator in a SAS Array

Examples :

array gnp (*) x y z;

The following two statements are equivalent.
sumgnp = sum(of gnp(*)) ;
sumgnp = sum(x,y,z); 
*Calculate the mean;
  mean_score = mean(of gnp(*));

* Calculate the minimum;
   min_score = min(of gnp(*));

DO OVER LOOP

The DO OVER loop is one of the most useful DO loops. It can be used with an array when indexing of the array is not needed.
data test;
set temp;
array nvars _numeric_;
do over nvars;
if nvars > 3 then nvars = .;
end;
run;
                     
     SOURCE  
Read More »

Friday, 7 April 2017

SAS : Cartesian Product

1) Cartesian Product

A simplest join in proc sql where the final out is each row of first dataset combined with each row of the second. Its syntax is:

Proc sql;
      create table company as
            select a.empid, b.name, b.salary
            from employee as a , salary as b
            ;
Quit;

Here the final output will be product of all rows from the employee dataset with all rows in the salary dataset.
Practically this looks wrong as everyone’s salary will be everyone others’ salary, perfect Socialism J

But surprisingly this Cartesian product is the basis of all joins. Whenever you specify a join a Cartesian product is done and the output rows are restricted by certain conditions. So knowing this is necessary.

2) Cartesian Product through a datastep

This technique is not very intuitive but is asked in a lot of interviews so I am including it here:

data every_combination;
  /* Set one of your data sets, usually the larger data set */
  set one;
  do i=1 to n;
    /* For every observation in the first data set,    */
    /* read in each observation in the second data set */
    set two point=i nobs=n;
    output;
  end;
 run;

This technique creates a product of all rows.
So above is the hint for answering the question in merging SAS datasets with different id variables names. Let’s see whether you can find the answer.

for more click here or click here
Read More »

Tuesday, 4 April 2017

SAS :Convert all Numeric Variables to Character using proc SQL

The task is to create a successor to a SAS data set, replacing each numeric variable in the original with a character variable. Here is a method which preserves variable order and variable attributes. First create a data set for the demonstration:


proc sql;
create table mixedtype as
 select *
  from sashelp.class;


quit;

Result:


                  Age as

                    of 1
 First             March
 Name      Sex      2008    Height  Weight
 -----------------------------------------
 Alfred    M          14        69  112.50
 Alice     F          13      56.5   84.00
 Barbara   F          13      65.3   98.00
 Carol     F          14      62.8  102.50
 Henry     M          14      63.5  102.50
 James     M          12      57.3   83.00
 Jane      F          12      59.8   84.50
 Janet     F          15      62.5  112.50
 Jeffrey   M          13      62.5   84.00
 John      M          12        59   99.50
 Joyce     F          11      51.3   50.50
 Judy      F          14      64.3   90.00
 Louise    F          12      56.3   77.00
 Mary      F          15      66.5  112.00
 Philip    M          16        72  150.00
 Robert    M          12      64.8  128.00
 Ronald    M          15        67  133.00
 Thomas    M          11      57.5   85.00
 William   M          15      66.5  112.00

Next, this somewhat tricky query which uses DICTIONARY.COLUMNS as a source of metadata and creates the bulk of another query:



proc sql;
select case type
        when 'num'  then catx (   ' '
                                , 'left( put ('
                                , name
                                , ','
                                , case
                                   when format is null then 'best12.'
                                   else format
                                   end
                                , ') ) as'
                                , name
                                , 'label ='
                                , quote( strip(label) )                              
                              )
        when 'char' then name
        else             catx (   ' '
                                , quote('Error on type')
                                , 'as'
                                , name
                              )
        end
 into : selections separated by ' , '
 from dictionary.columns
 where libname='WORK' and memname='MIXEDTYPE';



To adapt for other tables change only the literals in the WHERE clause.

Result:


 Name

 Sex
 left( put ( Age , best12. ) ) as Age label = "Age as of 1 March 2008"
 left( put ( Height , best12. ) ) as Height label = ""
 left( put ( Weight , 6.2 ) ) as Weight label = ""

The INTO clause takes these expressions and concatenates them into a comma-separated string which it stores in the macro variable SELECTIONS, which in turn is used in the statement which actually does the work:



proc sql;
create table allchar as select &selections from mixedtype;


quit;

Result:


 First          Age as of 1

 Name      Sex  March 2008    Height        Weight
 -------------------------------------------------
 Alfred    M    14            69            112.50
 Alice     F    13            56.5          84.00
 Barbara   F    13            65.3          98.00
 Carol     F    14            62.8          102.50
 Henry     M    14            63.5          102.50
 James     M    12            57.3          83.00
 Jane      F    12            59.8          84.50
 Janet     F    15            62.5          112.50
 Jeffrey   M    13            62.5          84.00
 John      M    12            59            99.50
 Joyce     F    11            51.3          50.50
 Judy      F    14            64.3          90.00
 Louise    F    12            56.3          77.00
 Mary      F    15            66.5          112.00
 Philip    M    16            72            150.00
 Robert    M    12            64.8          128.00
 Ronald    M    15            67            133.00
 Thomas    M    11            57.5          85.00
 William   M    15            66.5          112.00

Another query against DICTIONARY.COLUMNS compares the attributes of the original and replacement variables:



 select   name    format=$10.
        , memname format=$10.
        , varnum  format=10.
        , type    format=$8.
        , length  format=6.
        , format  format=$8.
        , label   format=$30.
  from dictionary.columns
  where libname='WORK' and memname in ('ALLCHAR','MIXEDTYPE')
  order by varnum, memname desc;


 quit;

Result:



                                          Column


 Column      Member       Number in  C     olumn        Column     Column

 Name        Name             Table  Type      Length  Format    Column Label
 --------------------------------------------------------------------------------------

 Name        MIXEDTYPE             1  char            8            First Name
 Name        ALLCHAR                 1  char            8            First Name
 Sex          MIXEDTYPE             2  char            1  $1.
 Sex          ALLCHAR                 2  char            1  $1.
 Age          MIXEDTYPE             3  num            8            Age as of 1 March 2008
 Age          ALLCHAR                 3  char           12            Age as of 1 March 2008
 Height       MIXEDTYPE            4  num            8
 Height      ALLCHAR                 4  char           12
 Weight      MIXEDTYPE            5  num            8  6.2
 Weight      ALLCHAR                5  char             6

This demonstrates that variable order and variable attributes are preserved.
Read More »