Sunday, 15 July 2012

Translating Stata code into R -



Translating Stata code into R -

general newbie when comes time series info analysis in r. having problem translating bit of stata code r code replication project doing.

the intent of stata code , stata code (from original analysis) following:

#### delete yearc observations different wartypes ##### drop if yearc==yearc[_n+1] & wartype!="civil" drop if yearc==yearc[_n-1] & wartype!="civil"

so, translated, maintain rows in country having civil war , delete rows in there interstate war during same years.

i have named info object (i.e., info set)

mywar

in r.

i assuming somehow conditional ifelse statement, or similar, such as:

invisible(mywar$yearc <- ifelse(mywar$yearc==n-1 | mywar$yearc==n+1 | mywar$wartype!=civil, na, mywar$yearc)) # assuming cannot status ifelse statements this; but, how imagine mywar <- mywar[!is.na(mywar$yearc),]

edit: perhaps example

> b <- c(1970, 1970, 1970, 1971, 1982, 1999, 1999, 2000, 2001, 2002) > c <- c("inter", "civil", "intra", "civil", "civil", "inter", "civil", "civil", "civil", "civil") > df <- data.frame(b,c) > df$j <- ifelse(df$b==n-1 & df$b==n+1 & df$c!="civil", na, df$b) > df b c j 1 1970 inter 1970 2 1970 civil 1970 3 1970 intra 1970 4 1971 civil 1971 5 1982 civil 1982 6 1999 inter 1999 7 1999 civil 1999 8 2000 civil 2000 9 2001 civil 2001 10 2002 civil 2002

so, trying create nas rows 1,3,and 6 duplicate years in logistic regression on onset of civil war (i not interested in inter , intra wars, defined) can delete these rows info set. here, recreated row b. (note, missing made info country ids. assume these 10 entries represent same country (for instance, somalia)). so, interested in how delete these type of rows in info set 28,000 rows.

you're focusing on stata's if qualifier, sounds want subset info frame--hence utilize of drop command in stata. learned stata before r , confused since relied heavily on if qualifier in stata , pursued ifelse in r. but, later realized more relevant technique in r revolved around subsetting. there subset() command, people prefer subsetting using brackets (see code below).

in original question inquire how 2 things:

how delete observations (i.e. rows) coded "inter" or "intra" on column c, , how mark them missing

sample data

b <- c(1970, 1970, 1970, 1971, 1982, 1999, 1999, 2000, 2001, 2002) c <- c("inter", "civil", "intra", "civil", "civil", "inter", "civil", "civil", "civil", "civil") df <- data.frame(b,c) df b c 1 1970 inter 2 1970 civil 3 1970 intra 4 1971 civil 5 1982 civil 6 1999 inter 7 1999 civil 8 2000 civil 9 2001 civil 10 2002 civil

1. dropping observations if want delete observations not "civil" in column c, can subset info frame maintain cases "civil":

df2 <- df[df$c=="civil",] df2 b c 2 1970 civil 4 1971 civil 5 1982 civil 7 1999 civil 8 2000 civil 9 2001 civil 10 2002 civil

the above code creates new info frame, df2, subset of df, can overwrite original info frame:

df <- df[df$c=="civil",]

or, can generate new 1 , remove old one, if don't workspace cluttered lots of info frames:

df2 <- df[df$c=="civil",] rm(df)

2. marking observations missing if want mark observations not "civil" in column c, can overwriting them na:

df$c[df$c != "civil"] <- na df b c 1 1970 <na> 2 1970 civil 3 1970 <na> 4 1971 civil 5 1982 civil 6 1999 <na> 7 1999 civil 8 2000 civil 9 2001 civil 10 2002 civil

you utilize listwise deletion (see na.omit() command) remove cases whatever analyses you're doing.

side note original stata code seeks subset when column b duplicate , column c "inter" or "intra". however, way sample info presented, seemed redundant concern, why solution above looks @ column c. however, if want match stata code closely possible, can by

df <- df[order(df$b, df$c),] df$duplicate <- duplicated(df$b) df2 <- df[df$c=="civil" & df$duplicate==false,]

which

orders info chronologically year , alphabetically war creates new variable specifies whether column b duplicate year subsets info frame remove undesirable cases.

r stata

No comments:

Post a Comment