|
Location: Desktop development - C/C++ License: The Microsoft Public License (Ms-PL) Linq-To-XML Style of Node Creation for C++Posted by Wong Shao VoonLinq-To-XML Node Creation for Native C++ |
Skill: BeginnerPosted: 19/10/2011Views: 129Rating: 5.00 /5Popularity: 0.00 |
| Sign Up to vote for this article |
This article discusses the new C++ Elmax XML Library feature to use Linq-To-XML node creation to write XML files. Currently, there is no plans to implement this feature for C# Elmax. C# users can use .NET Linq-To-XML to achieve the same XML writing. For those readers who might want to learn more about Elmax XML library, they may read this tutorial article and the documentation but their reading are not required to understand this article. The intended audience for this article, are XML library authors who may be interested in implementing this Linq-To-XML node creation feature for their XML libraries. Though Linq-To-XML node creation has already been mentioned several times, C++ programmers who work primarily in native C++, may be not familiar with Linq-To-XML node creation syntax and what it does and how it does it. Linq-To-XML node creation, simply said, is the natural way to create nodes with code structurally identical to resultant XML. To prove my point, I will show a .NET C# Linq-To-XML node creation code snippet to add a movie information to movies element.
For reader's information, the Visual Studio IDE will automatically indent your Linq-To-XML node creation code for you when you hit the enter key. The Movies1.xml output looks similar to what is displayed right below.
This is not difficult to visualize how the XML would look like from the C# code. In the next section, we shall compare the new Linq-To-XML and the original Elmax node creation.
I guess by right now, readers are eager to see the Linq-To-XML syntax for C++. Without further delay, the code is displayed at below.
As the reader may notice, the C++ syntax does not allocate the elements on the heap using the new keyword, unlike the C# version; in other words, the elements are allocated on the stack. C# Linq-To-XML allocates the elements on the heap which needs to be garbage-collected by the garbage-collector which hurts performance and requires more memory. For elements allocated on the stack, we do not have this massive memory consumption problem because they are popped off the stack immediately when the elements goes out of scope.
Underneath the surface, the memory is still allocated on the heap to construct the internal tree structure. Then the internal tree structure is converted to MS XML DOM elements recursively in the Save method. Just before the Save method returns, the internal tree structure is destroyed. If user wants to retain the tree structure for either another Save call or append the tree structure to a larger tree structure, he/she might not want to destroy the tree structure during Save; he/she can specify false for discard argument (default value is true) in the Save method. As a point of interest to the reader, the conversion to MS XML DOM structure in the Save method does not utilize Elmax, it use MS XML DOM API directly than through Elmax. The main reason is performance: Elmax is a fat layer on top of the MS XML DOM API.
By now, the reader may be curious to know how the original Elmax node creation stack up against the new Linq-To-XML node creation syntax. The example below shows how to save the same Movies2.xml, using original Elmax code.
As the reader can see that it can be hard to discern the structure of the XML just by casually glancing at the original Elmax code of node creation.
Surprisingly, the Linq-To-XML node creation code is very simple and can be written under a couple of hours. To create nodes, using the new syntax, we are required to use NewElement, NewAttribute, NewCData and NewComment class. These new classes are derived from NewNode class and they do most of their useful work in their constructors.
This is the code listing for the declaration of NewElement class.
The code listing of the overloaded constructor which takes in 8 NewNode parameters is listed here.
The code listing of the overloaded Add method with 8 NewNode parameters is listed here.
As you can see, NewElement constructors and its Add methods do nothing except appending the nodes to the vector. Below is the code listing for the declaration of NewAttribute class and definition of its only constructor.
This is the code listing for the declaration of NewCData class and definition of its only method: its constructor.
This is the code listing for the declaration of NewComment class and definition of its constructor.
The reader may ask the author why he chose to create new classes to do this, instead of modifying the old classes like Element, Attribute, CData and Comment. The reason is because these original classes contain many data members; To construct these class excessively on the stack and pop them out of the stack, would seriously hurt performance. As you would see from the above listing for new classes, I did not list their data member. That's because their only data member is ptr which exists in their base class, NewNode.
ptr is of type NewTreeNode. I had intended to name this tree structure, TreeNode but TreeNode is a reserved keyword in Visual C++ 10 because there is another TreeNode class defined in Visual C++ libraries.
NewTreeNode has Traverse method which creates MS XML DOM element as it traverse the tree recursively and it also has a Delete method which deletes the tree structure recursively. You see, to allocate and deallocate NewNode/NewElement objects on the stack, it is only a matter of pushing and popping 64bit/32bit pointers. Compare this in contrast to pushing and poping the heavy-duty Element class which contains these many data members below. For reader information, though the 64bit/32bit pointer is popped whenever NewNode object goes out of scope, the tree data which the pointer is pointed to, still lives on until they are saved to a file on disk.
The source code listing of the recursive methods of Traverse and Delete is provided for the reader's perusal.
While Elmax does not support Linq-To-XML style queries, it has some powerful query mechanism which is based on Lambda(anonymous function) to decide which elements to fetch back. Let me acquaint you with some of Elmax query mechanism.
Elmax has AsCollection and GetCollection methods which fetches a collection of siblings of the same name and fetches a collection of children of the same name, respectively. They both have an overloaded version which takes in an additional Lambda as predicate to filter the elements you want.
Elmax provides HyperElement class which allows joining elements with another element which satisfies certain criteria. For example, in a Books application, Book element under the main Books section will be joined with the Author element (through AuthorID) under the main Authors section to retrieve the author name for the books. Books section and Authors section are 2 separate sections. A sample of the XML is provided below.
This is the HyperElement class with Lambda in action!
This is the output. For more information on HyperElement, please refer to Elmax documentation.
List of books by Arthur C. Clark ============================================= 2001: A Space Odyssey Rendezvous with Rama List of books by Isaac Asimov ============================================= Foundation Currents of Space Pebbles in the Sky
In addition to these 2 methods of query, Elmax supports XPath expression through its various SelectNode methods.
The overloaded constructors and Add methods of NewElement are ranged from taking 1 NewNode object to maximum 16 NewNode objects. What if the user need to add more than 16 nodes (like 17) for each element? Ans: he/she can use the Add method because Add method returns itself though (*this). Let me show you an example of adding 32 sub-elements to an element without using for-loop. In practice, a for-loop is the preferred method for adding elements more than 16.
This is what the Stars.xml looks like after saving.
When the data in the data structure is converted to the intermediate tree structure and during saving, the intermediate tree structure is converted to MS XML DOM structure. Say this data structure takes up 100 MB, the intermediate tree structure and the MS XML DOM structure each takes up 100 MB (in reality, tree structure and DOM structure would takes some memory so the data stored using them, will be using more memory, but for this example, we assume they have the same memory requirement (100MB) as the data structure.) To save 100MB of data in XML using purely NewElement class, we require to consume 300MB.
To reduce memory consumption, we can use the Add method of Element with NewElement. Both Add methods of Element and NewElement takes in NewNode objects but the only difference is Element.Add constructs a tree structure and converts that tree structure to MS XML DOM structure whereas in NewElement, the conversion of tree structure to MS XML DOM structure is only done at the final saving stage. Imagine you have 1000,000 elements (which has their own children). NewElement only converts them (to MS XML DOM) at the final saving stage, while Element.Add converts them one by one, each time its Add method is called. It is obvious that using Element.Add has reduced memory requirement than that of NewElement.Add. The reason I cannot do the same thing for NewElement, because in MS XML DOM API, only the document object can create nodes which is why every Element object contains a copy of the document pointer as data member. This is a limitation of MS XML DOM API and unfortunately, Elmax is based on MS XML DOM API. Future Elmax will not have this limitation once it is rewritten from ground up, without relying on other third party library.
If you construct a NewElement object and its children without saving, you will have memory leak. Because Save method will delete internal tree structure after saving, user need to call Discard method to delete the internal tree structure, if he/she, for some reason, decide not to save. User need to be careful here to avoid memory leak. I chose the option not to use smart pointer to store the tree structure for performance and memory reasons. I am not fond of the idea of using smart pointer in my code.
I am currently writing the SAX version of Elmax and also its article titled "The XML SAX Article that Programmers Should (not) be Reading" as a sequel to the original Elmax DOM article titled "The XML Parsing Article that Should (not) be Written". The Reader and Writer class of the SAX version is kept similar to the Elmax DOM version, whenever possible. For the SAX writer class, the Linq-To-XML node creation syntax is similar except for 2 additional requirements.
<Book>) and end element stub (eg, </Book>) unless it does not have a value, (eg, <Book />). The reason for this requirement is the SAX library has no way of knowing when the user stops adding child elements and wants to close it.
This is how the SAX version of movie code will look like, with the CloseEndElement call.
So what is the rationale in keeping the DOM and SAX syntax similar? The reason are 2 fold. First of all, user does not need to learn a new syntax or totally new library to use SAX: Learning curve is lower. 2nd reason is I am writing a XML Object Relational Mapping (ORM) library using Elmax, when I keep the 2 syntax similar, then the ORM code generator for DOM and SAX Elmax would be similar to write as well. (Saves me some coding effort).
I do not know if it is just me: When I use Linq-To-XML node creation, I made the mistake a few times of using names with whitespace for my elements and attributes. According to the XML specification, names with whitespace are simply not allowed. I rarely make this mistake while using other traditional ways of creating XML. This is perhaps due to Linq-To-XML syntax 'mixes' the name and value together: In the traditional API of creating XML, I will know very well whether I are specifying for a name or a value. If any of you have problems getting the XML out, please check if any of your element and attribute name has whitespace. I am pointing this out in case any of readers here share the same level of intelligence as the author.
We have looked at the different syntax of .NET C# Linq-To-XML, C++ Elmax Linq-To-XML and C++ Elmax original way of node creation. We have briefly discussed the internal workings of C++ Elmax Linq-To-XML node creation. We have also looked at ways to reduce memory consumption and eliminate memory leaks. Lastly I want to leave you with full code listings of each node creation method to add 4 movie information and save them to XML. Elmax is hosted at Codeplex: you can always get the latest version there. Any constructive feedback on the article, good or bad, is welcome.
Thank you for reading!
.NET C# Linq-To-XML node creation
Elmax Linq-To-XML node creation
Elmax node creation
This is the XML output.
This article, along with any associated source code and files, is licensed under The Microsoft Public License (Ms-PL)
| Wong Shao Voon
| I guess I'll write here what I does in my free time, than to write an accolade of skills which I currently possess. I believe the things I does in my free time, say more about me. When I am not working, I like to watch Japanese anime. I am also writing some movie script, hoping to see my own movie on the big screen one day. I like to jog because it makes me feel good, having done something meaningful in the morning before the day starts. I also writes articles for IntelliProject; I have a few ideas to write about but never get around writing because of hectic schedule. Location: |
Sign up to post message on the article message board!