Skip to content

Commit ccb2fae

Browse files
authored
Merge pull request #64 from tcalmant/v3-writer
Initial version of v3 marshaller
2 parents 6aacf72 + 7508322 commit ccb2fae

5 files changed

Lines changed: 909 additions & 16 deletions

File tree

README.md

Lines changed: 187 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,9 @@
66
[![Coveralls status](https://coveralls.io/repos/tcalmant/python-javaobj/badge.svg?branch=master)](https://coveralls.io/r/tcalmant/python-javaobj?branch=master)
77

88
*python-javaobj* is a python library that provides functions for reading and
9-
writing (writing is WIP currently) Java objects serialized or will be
10-
deserialized by `ObjectOutputStream`. This form of object representation is a
11-
standard data interchange format in Java world.
9+
writing Java objects serialized or to be deserialized by `ObjectOutputStream`.
10+
This form of object representation is a standard data interchange format in
11+
Java world.
1212

1313
The `javaobj` module exposes an API familiar to users of the standard library
1414
`marshal`, `pickle` and `json` modules.
@@ -39,16 +39,17 @@ Since version 0.4.0, three implementations of the parser are available:
3939
with support of the object transformer (with a new API) and of the `numpy`
4040
arrays loading.
4141
* `v3`: a **new** implementation, written from scratch to benefit from
42-
Python 3.12+ features.
42+
Python 3.12+ features, with full read **and** write support.
4343

4444
You can use the `v1` parser to ensure that the behaviour of your scripts
45-
doesn't change and to keep the ability to write down files.
45+
doesn't change. It also provides a basic marshalling capability.
4646

4747
You can use the `v2` parser for developments in Python versions lower
4848
than 3.12 and *which won't require marshalling*, or as a *fallback*
4949
if the `v1` parser fails to parse a file.
5050

51-
For new development, you should use the `v3` parser.
51+
For new development, you should use the `v3` parser, which supports both
52+
reading and writing Java object streams.
5253

5354
### Object transformers V1
5455

@@ -110,7 +111,8 @@ You can find a sample usage in the *Custom Transformer* section in this file.
110111
* Primitive values un-marshalling
111112
* Automatic conversion of Java Collections to python ones
112113
(`HashMap` => `dict`, `ArrayList` => `list`, etc.)
113-
* Basic marshalling of simple Java objects (`v1` implementation only)
114+
* Basic marshalling of simple Java objects (`v1` implementation)
115+
* Full marshalling of Java object streams (`v3` implementation)
114116
* Automatically uncompresses GZipped files
115117

116118
## Requirements
@@ -544,6 +546,7 @@ value = pobj.myField
544546
| Correct `TYPE_CHAR` numpy dtype (`>u2`) ||||
545547
| Typed exception hierarchy ||||
546548
| `BlockData.__eq__(bytes)` compatibility ||||
549+
| Marshalling (writing) support | partial |||
547550

548551
### Security limits
549552

@@ -589,6 +592,180 @@ with open("arrays.ser", "rb") as fd:
589592
When `use_numpy_arrays=True`, a `NumpyArrayTransformer` is appended to the
590593
transformer list and primitive arrays are returned as `numpy.ndarray`.
591594

595+
### Marshalling / Writing (V3)
596+
597+
The `javaobj.v3` package exposes two additional entry-points for serializing
598+
beans back to the Java Object Serialization binary format:
599+
600+
* `dump(fd, *objects)`: Writes one or more parsed objects to a binary file
601+
descriptor opened in `wb` mode.
602+
* `dumps(*objects) -> bytes`: Returns the serialized stream as a `bytes`
603+
object.
604+
605+
Both functions accept any combination of
606+
`JavaInstance`, `JavaArray`, `JavaString`, `JavaEnum`, `JavaClass`, `BlockData`,
607+
and `None` (written as `TC_NULL`) as positional arguments.
608+
609+
#### Simple round-trip
610+
611+
```python
612+
import javaobj.v3 as javaobj
613+
614+
# Parse an existing file
615+
with open("obj5.ser", "rb") as fd:
616+
pobj = javaobj.load(fd)
617+
618+
# Serialize back to bytes
619+
data = javaobj.dumps(pobj)
620+
621+
# Or write directly to a file
622+
with open("obj5_copy.ser", "wb") as fd:
623+
javaobj.dump(fd, pobj)
624+
```
625+
626+
#### Writing multiple objects
627+
628+
```python
629+
import javaobj.v3 as javaobj
630+
from javaobj.v3.beans import JavaString
631+
632+
with open("a.ser", "rb") as fd:
633+
obj_a = javaobj.load(fd)
634+
635+
# Write two objects into one stream
636+
data = javaobj.dumps(obj_a, JavaString("hello"))
637+
638+
# Re-parse: returns a list when the stream holds more than one object
639+
result = javaobj.loads(data) # -> [obj_a, JavaString("hello")]
640+
```
641+
642+
#### Supported constructs
643+
644+
| Construct | Supported |
645+
|---|---|
646+
| `TC_OBJECT``NOWRCLASS` (plain fields only) ||
647+
| `TC_OBJECT``WRCLASS` (fields + block-data annotations) ||
648+
| `TC_ARRAY` ||
649+
| `TC_STRING` / `TC_LONGSTRING` ||
650+
| `TC_ENUM` ||
651+
| `TC_CLASS` ||
652+
| `TC_NULL` ||
653+
| `TC_BLOCKDATA` / `TC_BLOCKDATALONG` ||
654+
| `TC_PROXYCLASSDESC` ||
655+
| Back-references (`TC_REFERENCE`) | ✓ (automatic) |
656+
| `EXTERNAL_CONTENTS` (Protocol v1 `Externalizable`) ||
657+
658+
> **Note:** Back-references are tracked automatically by identity: if the same
659+
> object appears more than once in the graph, subsequent occurrences are
660+
> written as `TC_REFERENCE` — exactly as Java's `ObjectOutputStream` does.
661+
662+
#### Building a Java object from scratch
663+
664+
You can construct the v3 beans manually to serialize a Python object as if it
665+
were a Java one. The key types are:
666+
667+
* `JavaClassDesc` — the class descriptor (name, `serialVersionUID`, flags,
668+
fields)
669+
* `JavaField` — one field entry (type code + name, and optionally the binary
670+
class name for object/array fields)
671+
* `JavaInstance` — the object instance (`field_data` maps each class
672+
descriptor to a `{JavaField: value}` dict)
673+
* `JavaString` — a Java `String` value
674+
675+
All beans accept `handle=0` when created from scratch; the writer assigns real
676+
handles automatically during serialization.
677+
678+
```python
679+
import javaobj.v3 as javaobj
680+
from javaobj.constants import ClassDescFlags
681+
from javaobj.v3.beans import (
682+
FieldType,
683+
JavaClassDesc,
684+
ClassDescType,
685+
JavaField,
686+
JavaInstance,
687+
JavaString,
688+
)
689+
690+
# ── 1. Describe the Java class ────────────────────────────────────────────────
691+
#
692+
# Java equivalent:
693+
#
694+
# package com.example;
695+
# public class Point implements java.io.Serializable {
696+
# private static final long serialVersionUID = 1L;
697+
# public int x;
698+
# public int y;
699+
# }
700+
701+
field_x = JavaField(type=FieldType.INTEGER, name="x")
702+
field_y = JavaField(type=FieldType.INTEGER, name="y")
703+
704+
point_cd = JavaClassDesc(
705+
handle=0, # assigned by the writer
706+
name="com.example.Point",
707+
serial_version_uid=1,
708+
desc_flags=ClassDescFlags.SC_SERIALIZABLE,
709+
fields=[field_x, field_y],
710+
)
711+
712+
# ── 2. Create an instance ─────────────────────────────────────────────────────
713+
714+
point = JavaInstance(
715+
handle=0,
716+
classdesc=point_cd,
717+
field_data={
718+
point_cd: {
719+
field_x: 42,
720+
field_y: -7,
721+
}
722+
},
723+
)
724+
725+
# ── 3. Serialize ──────────────────────────────────────────────────────────────
726+
727+
data = javaobj.dumps(point)
728+
729+
# ── 4. Round-trip check ───────────────────────────────────────────────────────
730+
731+
restored = javaobj.loads(data)
732+
print(restored.get_field("x")) # 42
733+
print(restored.get_field("y")) # -7
734+
```
735+
736+
For object-type fields (e.g. a `String` attribute), use `FieldType.OBJECT`,
737+
set `class_name` to the binary class name, and pass a `JavaString` as the
738+
value:
739+
740+
```python
741+
field_name = JavaField(
742+
type=FieldType.OBJECT,
743+
name="name",
744+
class_name="Ljava/lang/String;", # binary name for java.lang.String
745+
)
746+
747+
person_cd = JavaClassDesc(
748+
handle=0,
749+
name="com.example.Person",
750+
serial_version_uid=1,
751+
desc_flags=ClassDescFlags.SC_SERIALIZABLE,
752+
fields=[field_name, field_x], # reuse field_x from above
753+
)
754+
755+
alice = JavaInstance(
756+
handle=0,
757+
classdesc=person_cd,
758+
field_data={
759+
person_cd: {
760+
field_name: JavaString(handle=0, value="Alice"),
761+
field_x: 30,
762+
}
763+
},
764+
)
765+
766+
data = javaobj.dumps(alice)
767+
```
768+
592769
---
593770

594771
## Migration to V3
@@ -602,7 +779,7 @@ transformer list and primitive arrays are returned as `numpy.ndarray`.
602779
| `pobj.myField` (direct attribute) | `pobj.get_field("myField")` (preferred) or `pobj.myField` |
603780
| `pobj._data` on arrays | `pobj.data` (public) |
604781
| `javaobj.JavaObjectUnmarshaller` | removed — use `javaobj.v3.parser.JavaStreamParser` |
605-
| `javaobj.JavaObjectMarshaller` | marshalling not available in `v3` |
782+
| `javaobj.JavaObjectMarshaller` | `javaobj.v3.dump` / `javaobj.v3.dumps` |
606783
| Exceptions: bare `Exception` | Typed: `ParseError`, `UnexpectedOpcodeError`, … |
607784

608785
Shallow conversion helper (best-effort, for gradual migration):
@@ -637,5 +814,5 @@ from javaobj.v3._compat import v2_to_v3
637814
v3_obj = v2_to_v3(v2_obj)
638815
```
639816

640-
> **Note:** `v3` requires **Python 3.12+** and does **not** support marshalling
641-
> (writing). If you need to write Java object streams, use `v1`.
817+
> **Note:** `v3` requires **Python 3.12+**.
818+
> For writing Java object streams on older Python versions, use `v1`.

javaobj/v3/__init__.py

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
#!/usr/bin/env python3
22
"""
3-
Rewritten version of the un-marshalling process of javaobj (v3)
3+
Rewritten version of the un-marshalling and marshalling process of javaobj (v3)
44
5-
This package targets Python 3.12+ and provides fully typed parsing of the
6-
Java Object Serialization stream format, in read-only mode.
5+
This package targets Python 3.12+ and provides fully typed parsing and
6+
serializing of the Java Object Serialization stream format.
77
88
:authors: Thomas Calmant
99
:license: Apache License 2.0
@@ -66,11 +66,16 @@
6666
NumpyArrayTransformer,
6767
ObjectTransformer,
6868
)
69+
from .writer import JavaStreamWriter, dump, dumps
6970

7071
__all__ = [
71-
# Entry points
72+
# Entry points (reading)
7273
"load",
7374
"loads",
75+
# Entry points (writing)
76+
"dump",
77+
"dumps",
78+
"JavaStreamWriter",
7479
# Transformer API
7580
"ObjectTransformer",
7681
"DefaultObjectTransformer",

0 commit comments

Comments
 (0)