Comparing two vectors one value at a time without using WHILE
I have two tables: df.author and df.post, which are related by a
one-to-many relation. Now I changed the primary key of df.author and I
want df.post to mirror the change. In the following R script I use match()
in a while loop to compare the foreign key of each row of df.post with the
old primary key of df.author and-when they match-replace the foreign key
with the new one (form a different column of df.author). Please consider
the following:
foreignkey <- c("old_pk1","old_pk2","old_pk3","old_pk4","old_pk5")
df.post <- data.frame(foreignkey,stringsAsFactors=FALSE)
rm(foreignkey)
primarykey_old <- c("old_pk1","old_pk2","old_pk3","old_pk4","old_pk5")
primarykey_new <- c("new_pk1","new_pk2","new_pk3","new_pk4","new_pk5")
df.author <- data.frame(primarykey_old, primarykey_new,
stringsAsFactors=FALSE);
rm(primarykey_old); rm(primarykey_new)
i <- 1; N <- length(df.post$foreignkey)
while (i <= N) {
match <- match(df.post$foreignkey[i], df.author$primarykey_old)
if (!is.na(match)) {
df.post$foreignkey[i] <- df.author$primarykey_new[match]
}
i <- i + 1
}
rm(N); rm(i); rm(match)
The script works but because of while doesn't scale efficiently for a
large dataset. I have read that using apply() (in my case by converting to
a matrix) is usually better than using while. I wonder if it also applies
to my case. Because if you look at the loop you see I need to go through
every single row of the dataframe to get the foreign key and then through
out df.author for a match(). Can I compress the computational time by not
using while?
No comments:
Post a Comment