Best way to insert your own dataset?

Hi all,

I was wondering what would be the best way to use MatMiner with my own dataset. I’d like to use the featurization and plotting capabilities, but have a separate data set. Ideally, I’m looking for something where I can start from .cif files + properties and integrate that with MatMiner. Would the best way be to load those into pymatgen, build my own MongoDB database from that and use that to interface with MatMiner? Or is there a better way to go about this?

Thanks,

Simon

1 Like

Hi Simon,

The primary object matminer works with is the pandas dataframe. You can use matminer without the dataframe but is a lot easier to just use it. You don’t need a MongoDB database, just the dataframe.

Here’s an example of how to go from a bunch of cif files and properties to dataframe:

import os
import pandas as pd
from pymatgen import Structure

properties = []
structures = []
for i, structure_file in enumerate(os.listdir(“path/to/cif/files”):
property = get_property_from_index(i)
structure = Structure.from_file(structure_file)
properties.append(property)
structures.append(structure)

df = pd.DataFrame({“some_property”: properties, “structure”: structures})
print(df) # make sure the dataframe appears like you intended
df.to_pickle("/path/where/u/want/to/save/ur/dataframe.p")

``

You can then load your dataset later with:

df = pd.read_pickle("/path/where/u/want/to/save/ur/dataframe.p")

``

By the way, if your dataset is open source, published in a peer reviewed journal, and not already in matminer , please consider adding it to matminer via our dataset addition guide!

···

On Saturday, March 30, 2019 at 7:52:50 PM UTC-7, [email protected] wrote:

Hi all,

I was wondering what would be the best way to use MatMiner with my own dataset. I’d like to use the featurization and plotting capabilities, but have a separate data set. Ideally, I’m looking for something where I can start from .cif files + properties and integrate that with MatMiner. Would the best way be to load those into pymatgen, build my own MongoDB database from that and use that to interface with MatMiner? Or is there a better way to go about this?

Thanks,

Simon

Great, thank you!

···

On Mon, Apr 1, 2019 at 12:41 AM [email protected] wrote:

Hi Simon,

The primary object matminer works with is the pandas dataframe. You can use matminer without the dataframe but is a lot easier to just use it. You don’t need a MongoDB database, just the dataframe.

Here’s an example of how to go from a bunch of cif files and properties to dataframe:

import os
import pandas as pd
from pymatgen import Structure

properties = []
structures = []
for i, structure_file in enumerate(os.listdir(“path/to/cif/files”):
property = get_property_from_index(i)
structure = Structure.from_file(structure_file)
properties.append(property)
structures.append(structure)

df = pd.DataFrame({“some_property”: properties, “structure”: structures})
print(df) # make sure the dataframe appears like you intended
df.to_pickle("/path/where/u/want/to/save/ur/dataframe.p")

``

You can then load your dataset later with:

df = pd.read_pickle("/path/where/u/want/to/save/ur/dataframe.p")

``

By the way, if your dataset is open source, published in a peer reviewed journal, and not already in matminer , please consider adding it to matminer via our dataset addition guide!

On Saturday, March 30, 2019 at 7:52:50 PM UTC-7, [email protected] wrote:

Hi all,

I was wondering what would be the best way to use MatMiner with my own dataset. I’d like to use the featurization and plotting capabilities, but have a separate data set. Ideally, I’m looking for something where I can start from .cif files + properties and integrate that with MatMiner. Would the best way be to load those into pymatgen, build my own MongoDB database from that and use that to interface with MatMiner? Or is there a better way to go about this?

Thanks,

Simon

You received this message because you are subscribed to a topic in the Google Groups “matminer” group.

To unsubscribe from this topic, visit https://groups.google.com/d/topic/matminer/Vs7FxTeH1XA/unsubscribe.

To unsubscribe from this group and all its topics, send an email to [email protected].

For more options, visit https://groups.google.com/d/optout.