XmlSlurper Namespace Question

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

XmlSlurper Namespace Question

garneke
I have a requirement in my application to receive xml files from a user and filter out specific nodes based on a configuration option ( accepts GPath String ).

This is fine.  
I can use XmlSlurper to parse the file and with the defined GPath I can find and remove the node and rewrite the file.

My application is generic and can accept any XML.  
My user base is not terribly savvy so I need to be able to have the GPath specified without the namespace prefixes.   This is also fine if the XmlSlurper is created to be namespace aware.

The problem is...
If my XmlSlurper is namespace aware and I remove a node, when I re-write the XML file all of the namespace prefixes get altered to "tag0:", "tag1:", "tag2:" etc.

Is there a way to produce the XML with its original namespace prefixes?
Is there someway I can query the original file for its namespaces and use that declare the namespaces for the slurper?

Thanks in advance for your help.


Reply | Threaded
Open this post in threaded view
|

Re: XmlSlurper Namespace Question

jwagenleitner
Are you doing more than just removing nodes?  The following seems to retain the namespace prefixes when serializing back out after removing a node.


import groovy.xml.*

String xml = '''
<a:root xmlns:a="urn:a" xmlns:b="urn:b" xmlns:c="urn:c">
    <a:child1>child1</a:child1>
    <b:child2>child2</b:child2>
    <c:child3>child3</c:child3>
</a:root>
'''

def root = new XmlSlurper().parseText(xml)
root.child2.replaceNode {}

String newXml = XmlUtil.serialize(root)
def newRoot = new XmlSlurper().parseText(newXml)

println XmlUtil.serialize(newRoot)
assert newRoot.lookupNamespace('a') == 'urn:a'
assert newRoot.child3.lookupNamespace('c') == 'urn:c'


Don't believe there's a defined way to get all the namespaces from the original files, other than possibly picking at the non-public namespaceTagHints field or by visiting each node and using #namespaceURI() to get it's namespace.

On Mon, Apr 11, 2016 at 9:40 AM, garneke <[hidden email]> wrote:
I have a requirement in my application to receive xml files from a user and
filter out specific nodes based on a configuration option ( accepts GPath
String ).

This is fine.
I can use XmlSlurper to parse the file and with the defined GPath I can find
and remove the node and rewrite the file.

My application is generic and can accept any XML.
My user base is not terribly savvy so I need to be able to have the GPath
specified without the namespace prefixes.   This is also fine if the
XmlSlurper is created to be namespace aware.

*The problem is...*
If my XmlSlurper is namespace aware and I remove a node, when I re-write the
XML file all of the namespace prefixes get altered to "tag0:", "tag1:",
"tag2:" etc.

Is there a way to produce the XML with its original namespace prefixes?
Is there someway I can query the original file for its namespaces and use
that declare the namespaces for the slurper?

Thanks in advance for your help.






--
View this message in context: http://groovy.329449.n5.nabble.com/XmlSlurper-Namespace-Question-tp5732293.html
Sent from the Groovy Users mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

RE: XmlSlurper Namespace Question

garneke

I appreciate your response.

 

The problem is that the node that is being removed will be defined by the user during a configuration stage.

Because of the generic nature the node to remove will be defined via a GPath string.

 

I do not want to require the user to specify the GPath string with the namespace prefixes.

 

To expand on your example…

String xml = '''

<a:root xmlns:a="urn:a" xmlns:b="urn:b" xmlns:c="urn:c">

    <a:child1>child1</a:child1>

    <b:child2>

       <b:content>

         <b:dataNode>

            Proprietary data to sanitize

         </b:dataNode>

       </b:content>

    </b:child2>

    <c:child3>child3</c:child3>

</a:root>

'''

User would enter a GPath of  “child2.content.dataNode”  NOT  “b:child2.b:content.b:dataNode”

 

In order for this to work the XML has to be parsed with the XmlSlurper argument namespaceAware=true.

def root = new XmlSlurper(false, true).parseText(xml)

 

Now the println will show that the namespace prefixes are all altered a: = tag0: , b: = tag1: , c: = tag2: , etc.

println XmlUtil.serialize(newRoot)

 

 

 

 

From: John Wagenleitner [mailto:[hidden email]]
Sent: Friday, April 15, 2016 2:18 PM
To: [hidden email]
Subject: Re: XmlSlurper Namespace Question

 

Are you doing more than just removing nodes?  The following seems to retain the namespace prefixes when serializing back out after removing a node.

 

 

import groovy.xml.*

 

String xml = '''

<a:root xmlns:a="urn:a" xmlns:b="urn:b" xmlns:c="urn:c">

    <a:child1>child1</a:child1>

    <b:child2>child2</b:child2>

    <c:child3>child3</c:child3>

</a:root>

'''

 

def root = new XmlSlurper().parseText(xml)

root.child2.replaceNode {}

 

String newXml = XmlUtil.serialize(root)

def newRoot = new XmlSlurper().parseText(newXml)

 

println XmlUtil.serialize(newRoot)

assert newRoot.lookupNamespace('a') == 'urn:a'

assert newRoot.child3.lookupNamespace('c') == 'urn:c'

 

 

Don't believe there's a defined way to get all the namespaces from the original files, other than possibly picking at the non-public namespaceTagHints field or by visiting each node and using #namespaceURI() to get it's namespace.

 

On Mon, Apr 11, 2016 at 9:40 AM, garneke <[hidden email]> wrote:

I have a requirement in my application to receive xml files from a user and
filter out specific nodes based on a configuration option ( accepts GPath
String ).

This is fine.
I can use XmlSlurper to parse the file and with the defined GPath I can find
and remove the node and rewrite the file.

My application is generic and can accept any XML.
My user base is not terribly savvy so I need to be able to have the GPath
specified without the namespace prefixes.   This is also fine if the
XmlSlurper is created to be namespace aware.

*The problem is...*
If my XmlSlurper is namespace aware and I remove a node, when I re-write the
XML file all of the namespace prefixes get altered to "tag0:", "tag1:",
"tag2:" etc.

Is there a way to produce the XML with its original namespace prefixes?
Is there someway I can query the original file for its namespaces and use
that declare the namespaces for the slurper?

Thanks in advance for your help.






--
View this message in context: http://groovy.329449.n5.nabble.com/XmlSlurper-Namespace-Question-tp5732293.html
Sent from the Groovy Users mailing list archive at Nabble.com.

 

Reply | Threaded
Open this post in threaded view
|

Re: XmlSlurper Namespace Question

jwagenleitner
By default XmlSlurper is namespace aware so new XmlSlurper() is the same as new XmlSlurper(false, true).  I tried your example and I still get the original prefixes.  Maybe the difference is in how your code is removing the node?


import groovy.xml.*
 
String xml = '''
<a:root xmlns:a="urn:a" xmlns:b="urn:b" xmlns:c="urn:c">
    <a:child1>child1</a:child1>
    <b:child2>
        <b:content>
            <b:dataNode>data</b:dataNode>
        </b:content>
    </b:child2>
    <c:child3>child3</c:child3>
</a:root>
'''
 
def root = new XmlSlurper(false, true).parseText(xml)

String userInput = 'child2.content.dataNode'
def nodeToRemove = root
userInput.split("\\.").each {
    nodeToRemove = nodeToRemove."${it}"
}
nodeToRemove.replaceNode {}
 
String newXml = XmlUtil.serialize(root)
def newRoot = new XmlSlurper(false, true).parseText(newXml)
 
println XmlUtil.serialize(newRoot)
assert newRoot.lookupNamespace('a') == 'urn:a'
assert newRoot.child3.lookupNamespace('c') == 'urn:c'


OUTPUT:

<?xml version="1.0" encoding="UTF-8"?><a:root xmlns:a="urn:a">
  <a:child1>child1</a:child1>
  <b:child2 xmlns:b="urn:b">
    <b:content/>
  </b:child2>
  <c:child3 xmlns:c="urn:c">child3</c:child3>
</a:root>


On Fri, Apr 15, 2016 at 11:41 AM, Kenton Garner <[hidden email]> wrote:

I appreciate your response.

 

The problem is that the node that is being removed will be defined by the user during a configuration stage.

Because of the generic nature the node to remove will be defined via a GPath string.

 

I do not want to require the user to specify the GPath string with the namespace prefixes.

 

To expand on your example…

String xml = '''

<a:root xmlns:a="urn:a" xmlns:b="urn:b" xmlns:c="urn:c">

    <a:child1>child1</a:child1>

    <b:child2>

       <b:content>

         <b:dataNode>

            Proprietary data to sanitize

         </b:dataNode>

       </b:content>

    </b:child2>

    <c:child3>child3</c:child3>

</a:root>

'''

User would enter a GPath of  “child2.content.dataNode”  NOT  “b:child2.b:content.b:dataNode”

 

In order for this to work the XML has to be parsed with the XmlSlurper argument namespaceAware=true.

def root = new XmlSlurper(false, true).parseText(xml)

 

Now the println will show that the namespace prefixes are all altered a: = tag0: , b: = tag1: , c: = tag2: , etc.

println XmlUtil.serialize(newRoot)

 

 

 

 

From: John Wagenleitner [mailto:[hidden email]]
Sent: Friday, April 15, 2016 2:18 PM
To: [hidden email]
Subject: Re: XmlSlurper Namespace Question

 

Are you doing more than just removing nodes?  The following seems to retain the namespace prefixes when serializing back out after removing a node.

 

 

import groovy.xml.*

 

String xml = '''

<a:root xmlns:a="urn:a" xmlns:b="urn:b" xmlns:c="urn:c">

    <a:child1>child1</a:child1>

    <b:child2>child2</b:child2>

    <c:child3>child3</c:child3>

</a:root>

'''

 

def root = new XmlSlurper().parseText(xml)

root.child2.replaceNode {}

 

String newXml = XmlUtil.serialize(root)

def newRoot = new XmlSlurper().parseText(newXml)

 

println XmlUtil.serialize(newRoot)

assert newRoot.lookupNamespace('a') == 'urn:a'

assert newRoot.child3.lookupNamespace('c') == 'urn:c'

 

 

Don't believe there's a defined way to get all the namespaces from the original files, other than possibly picking at the non-public namespaceTagHints field or by visiting each node and using #namespaceURI() to get it's namespace.

 

On Mon, Apr 11, 2016 at 9:40 AM, garneke <[hidden email]> wrote:

I have a requirement in my application to receive xml files from a user and
filter out specific nodes based on a configuration option ( accepts GPath
String ).

This is fine.
I can use XmlSlurper to parse the file and with the defined GPath I can find
and remove the node and rewrite the file.

My application is generic and can accept any XML.
My user base is not terribly savvy so I need to be able to have the GPath
specified without the namespace prefixes.   This is also fine if the
XmlSlurper is created to be namespace aware.

*The problem is...*
If my XmlSlurper is namespace aware and I remove a node, when I re-write the
XML file all of the namespace prefixes get altered to "tag0:", "tag1:",
"tag2:" etc.

Is there a way to produce the XML with its original namespace prefixes?
Is there someway I can query the original file for its namespaces and use
that declare the namespaces for the slurper?

Thanks in advance for your help.






--
View this message in context: http://groovy.329449.n5.nabble.com/XmlSlurper-Namespace-Question-tp5732293.html
Sent from the Groovy Users mailing list archive at Nabble.com.

 


Reply | Threaded
Open this post in threaded view
|

Re: XmlSlurper Namespace Question

garneke
Thanks!  - I have extended the example to show four variations of this.  
I my code I was creating a SAXParser pool to initialize the XmlSlurper with because I was told it greatly reduce the time and overhead of instantiating an XmlSlurper object from scratch.
Although, I thought I had tested with and without a SAXParser I must have missed something.
So from the four examples here you can see that the SAXParser introduces the namespaces that I did not want.

Perhaps I could just use a pool of XmlSlurper objects..

import groovy.xml.*;
import groovy.util.slurpersupport.GPathResult;
import javax.xml.parsers.*;


public boolean removePath(def xml, def path) throws Exception{
    Eval.x( xml, "x.${path}.replaceNode({})" )
    return true;
}

public GPathResult getPathFilter( GPathResult gpResult, String filterPath ) {
    removePath( gpResult, filterPath );
    return gpResult;
}
 
String xml = '''
<a:root xmlns:a="urn:a" xmlns:b="urn:b" xmlns:c="urn:c">
    <a:child1>child1</a:child1>
    <b:child2>
        <b:content>
            <b:dataNode>data</b:dataNode>
        </b:content>
    </b:child2>
    <c:child3>child3</c:child3>
</a:root>
'''
String nodeToRemove = 'child2.content.dataNode';
String nodeToRemove_Incl_NS = "'b:child2'.'b:content'.'b:dataNode'";
SAXParserFactory parserFactory = SAXParserFactory.newInstance();

// 1) - SAXParse NS Aware
parserFactory.setNamespaceAware( true );
SAXParser parser = parserFactory.newSAXParser();
def root_one = new XmlSlurper(parser).parseText(xml)
getPathFilter( root_one , nodeToRemove);
def newXml =  XmlUtil.serialize(root_one)
print "1) - SAXParse NS Aware Filter:["+nodeToRemove+"]\n" + newXml

// 2) - SAXParse NON-NS Aware
parserFactory.setNamespaceAware( false );
parser = parserFactory.newSAXParser();
def root_two = new XmlSlurper(parser).parseText(xml)
getPathFilter( root_two , nodeToRemove_Incl_NS);
newXml =  XmlUtil.serialize(root_two)
print "\n2) - SAXParse Non-NS Aware Filter:["+nodeToRemove_Incl_NS+"]\n" + newXml

// 3) - Slurper NS Aware
def root_three = new XmlSlurper(false, true).parseText(xml)
getPathFilter( root_three , nodeToRemove);
newXml =  XmlUtil.serialize(root_three)
print "\n3) - XmlSlurper NS Aware Filter:["+nodeToRemove+"]\n" + newXml

// 4) - Slurper Non-NS Aware
def root_four = new XmlSlurper(false, false).parseText(xml)
getPathFilter( root_four , nodeToRemove_Incl_NS);
newXml =  XmlUtil.serialize(root_four)
print "\n4) - XmlSlurper Non-NS Aware Filter:["+nodeToRemove_Incl_NS+"]\n" + newXml