More

ohyesyodo · on Dec 6, 2014

Okay, so what am I supposed to use if I want to transform a XML file from one format to another because two different systems needs XML for input /output and they have different fixed formats?

crdoconnor · on Dec 6, 2014

I would use a language that has a decent XML parser (e.g. python + lxml) for input and a decent templating language (e.g. jinja2) for output.

Assuming it was a simple transformation, the python parsing code could be under 10 lines. Most of what you wrote would be templating. It would be 98% declarative.

If it got complicated though (e.g. you're doing some aggregation or something), the python bit would grow but it probably never end up looking that horrendous, unlike XSLT.

The same pattern could be applied to many other language ecosystems. You just need to make sure you get the best XML parsing library and the best templating language.

ilitirit · on Dec 6, 2014

You could do the exact same thing using XSLT. eg.

    var result = new XSLTProcessor().importStylesheet(xsl).transform(....);

It's a bit of a pointless example because it really depends on the transformations you need. I'm sure in some cases XSLT would be better for the job, and in other cases another language. Most of the time it would generally just depend on your environment, available tools and skillset.

crdoconnor · on Dec 6, 2014

>I'm sure in some cases XSLT would be better for the job

In some (simple transformation) cases XSLT would be no worse, but mostly it would be worse. I can't see it being clearer or easier to maintain under any circumstances.

Once your code evolves toward doing anything mildly complicated you'll wish you never made your transformation in xslt.

jgalt212 · on Dec 6, 2014

I don't know if you're trolling, but here's how I'd do it in Python

  import xml.etree.ElementTree as ET
  tree = ET.parse('data.xml')
  new_tree = my_transform(tree)
  new_tree.write("output.xml")

ohyesyodo · on Dec 6, 2014

I'm not sure if you are trolling, but you left out the actual transform. In my experience, XSLT feels optimized towards transforms, which most other languages aren't. I also dislike XSLT, but whenever I do things like this in Python, C# or C++ it tends to get more messy then my XLTS when the transforms are nontrivial.

jgalt212 · on Dec 6, 2014

I'm getting downvotes here? Obviously, not from ohyesyodo b/c he does not have enough karma to allocate downvotes.

ohyesyodo · on Dec 8, 2014

You are probably getting downloads from the fact that your response did not make any sense.

jgalt212 · on Dec 11, 2014

If you know Python, my response make perfect sense, in that it's easy to transform xml string -> python tree -> new python tree -> new xml string or json string or csv string.

ohyesyodo · on Nov 25, 2014

Tell me how to set up a PG cluster where nodes can come and go as they like (for example without the need to reseed databases when they come online) in less than 30 minutes.

eddd · on Nov 25, 2014

If you want to scale your system horizontally in 30 mins, you probably don't need to do it. It is fun in tutorials, but when you are dealing with tera(peta)bytes of data, you probably need to be more cautious.

ohyesyodo · on Nov 26, 2014

This is just an incorrect assumption in my opinion. Say you are running on Microsoft Azure for example. There are no uptime SLA unless you run your service on at least two machines, because machines needs to reboot, be upgraded etc.

Also, if you take a look at perf when running PG on for example Azure or EC2, you will realize that IO is pretty slow but nodes are cheap. So you want to scale out early.

Running stuff on a single machine sounds like a perfect single point of failure to me. The actual size of the data does not affect wheter single point of failures are acceptable in a business.

I've seen so many people recommend PGSQL, saying its very simple, but when actually asked about how to set up a simple cluster which fulfills the absolutely basic requirements, then everyone just responds somethin similar to what you wrote. I find it very annoying to be honest.

ohyesyodo · on Nov 20, 2014

The last two times there was a big issue the same thing happened with the status dashboard (it became inaccessible). I remember the same issue when the certs expired 1,5 years ago. I really like Microsoft and was convinced "you" would somehow isolate the dashboard and host it separately, but it turns out I was wrong. Do you happen to know the reasons for hosting the status dashboard inside of Azure? It seems so counter-intuitive to me. Or is it actually hosted externally but died due to the load when the issue started to appear?

The OP mentions that Microsoft representatives gave info via public forums. When the issue appeared I looked in different places trying to find info, but only I found was a statement saying that We are aware of issues. I looked at Azure twitter/blog, ScottGu twitter/blog, Hanselmans, MSDN forums. I also tried this forum and reddit. Do you know where I should have gone to receive details?

coreysa · on Nov 20, 2014

Thanks. The communications and the service health dashboard are two areas that we are creating improvement plans from the learning of this event. For the dashboard, we do expect it to continue to run even through outages like this one, but we did encounter an issue with our fallback mechanism that we need to understand more deeply.

For general communications, we did most of our early communication on the event using twitter, announcing the incident and giving updates. We need to build a more formal multi-pronged approach to communicating, including faster responses in the MSDN forums and here in HN to make sure we are reaching as many of our customers and partners as possible. Thanks again for the feedback!!

ohyesyodo · on Nov 20, 2014

How about not rolling out a patch to all data centers at once?

coreysa · on Nov 20, 2014

Hi, this is Corey Sanders, an engineer on the Azure compute team. Yes, our normal policy for updates is to roll them in incremental batches. In this case, due to an operational error, we did not apply the changes as per normal policy.

ohyesyodo · on Nov 20, 2014

What I've seen from their patching of ordinary machines, I would say its pretty far from controlled or well thought through. Their patching has led to our machines becoming unavailable before, despite that we have multiple machines in the same availability set. We've been in contact with support to describe what happens and have gotten an Oh, its by design-response back.

ohyesyodo · on Nov 19, 2014

Hmm. You should be using CNAME records rather than IP addresses. Or are you using the new fixed IP features?

ohyesyodo · on Nov 19, 2014

Just apply same buggy network patch to all DCs at once? They use software networking so causing something like this should be easy. Or mess up network routing for *.blob.core.windows.net which pretty much all of Azure relies on.

icebraining · on Nov 19, 2014

Isn't applying the same patch everywhere at once a major anti-pattern?

ohyesyodo · on Nov 20, 2014

Turns out this was exactly what happened - they applied a buggy patch to all data centers at once by mistake.

davis · on Nov 19, 2014

ohyesyodo · on Nov 19, 2014

Status page is completely broken. I have been refreshing for an hour and seen like 5 different variants of information and all have been incorrect.

ohyesyodo · on Nov 19, 2014

According to status page a fix has been applied, but DNR is still down. Maybe they have to spin up a couple of VMs..

ohyesyodo · on Nov 17, 2014

If you dont care about robustness, not losing data, etc there are a lot of great databases out there.