Write a function to remove duplicate entries for any given XML. The node considered a duplicate, then provided "key" field is a duplicate.

Example XML:

<Products>
<Product>
<Name>Milk</Name>
<Amount>4</Amount>
</Product>
<Product>
<Name>Milk</Name>
<Amount>0.5</Amount>
</Product>
<Product>
<Name>Coffe</Name>
<Amount>0.5</Amount>
</Product>
</Products>

Based on the "Name" field, node 1 and 2 considered duplicated, but based on "Amount" field nodes 2 and 3 are duplicates. So, the task is to write a function:


string DeDup(string xml, string keyNode, string rootPath)


Possible solution:


private static string RemoveDuplicates(string xml, string key, string rootXPath)
{
XmlDocument doc = new XmlDocument();
List sb = new List();
string keyValue;
try
{
doc.Load(xml);
XmlElement root = doc.DocumentElement;
XmlNodeList xnodelist = root.SelectNodes(rootXPath);
int i=0;
foreach (XmlNode item in xnodelist)
{
i++;
keyValue = item.SelectSingleNode(key).InnerXml;
if (sb.Contains(keyValue))
xnode.RemoveChild(item);
else
sb.Add(keyValue);
}
return doc.OuterXml;
}
catch (Exception ex)
{
// Log exception...
throw ex;
}
}




This solution while works well for small xml file, is not a good fit for de-duping large XMLs.So, the bonus question will be to utilize SAX parser in C# to remove duplicates in large XML files...







Answers and Comments